Mistral AI

Mixtral 8x7B Instruct

The OG public MoE — 8 experts, 2 active per token, 47 B total / 13 B active. Apache-2.0.

46.7B parametersmixtralapache-2.032K context28GB - 34GB VRAM

About This Model

Mixtral 8x7B Instruct from Mistral AI was the first open Mixture-of-Experts model with credible chat capability. 47 B total parameters but only 13 B activate per token, so it punches at the level of a much bigger model while running at the speed of a 13 B. Needs ~28 GB VRAM at Q4, which lands it on dual-3090 or single A6000 territory.

Check Your Hardware

See which quantizations of Mixtral 8x7B Instruct your hardware can run.

Quantization Options

QuantizationBitsFile SizeVRAM NeededRAM NeededQuality
Q4_K_M4.526.4 GB28 GB32 GB
85%
Q5_K_M5.532.2 GB34 GB38 GB
92%

Context window & KV cache

Adds 2.50 GB to VRAM

Long chats and RAG inputs cost real memory. Drag to see how 32K vs 128K context shifts your grade.

Model native max: 32K tokens. KV-cache estimate is approximate (±30 %); real usage depends on attention layout.

How to run Mixtral 8x7B Instruct

Pick a runtime — copy & paste. Commands are pre-filled with this model’s repo.

GUI. Browse → download → chat. MLX on Apple Silicon.

LM Studio home →
  1. 1

    Open LM Studio

    Go to the 🔍 Search tab.

  2. 2

    Search for

    TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF
  3. 3

    Download

    Pick the Q4_K_M quant — best balance of size vs. quality.

  4. 4

    Chat

    Hit ▶ Load Model and start chatting. Toggle 'Local Server' to expose an OpenAI-compatible API on :1234.

Community benchmarks

Real tokens/sec reports from people running Mixtral 8x7B Instruct on actual hardware.

No community runs yet for this model. Be the first to submit your numbers.

See It In Action

Real model outputs generated via RunThisModel.com — watch responses stream in real time.

Llama 3.3 70B responding...

Outputs generated by real AI models via RunThisModel.com. Generation speed shown is from cloud inference. Local speeds vary by hardware — check your device.

Frequently Asked Questions

How much VRAM do I need to run Mixtral 8x7B Instruct?

Mixtral 8x7B Instruct requires 28GB VRAM minimum with Q4_K_M quantization. For full precision, you need 34GB VRAM.

What is the best quantization for Mixtral 8x7B Instruct?

Q4_K_M offers the best balance of quality and VRAM usage. Q8_0 is near-lossless if you have enough VRAM.