Mistral AI
Mixtral 8x22B Instruct
141 B total / 39 B active MoE. Larger Mixtral; needs serious hardware.
About This Model
Mixtral 8x22B is the bigger sibling of 8x7B — same architecture, just much bigger experts. 88 GB VRAM at Q4 puts it in single-H100 / multi-A100 territory. Active parameter count at 39 B means it still runs faster than a dense 70 B.
Check Your Hardware
See which quantizations of Mixtral 8x22B Instruct your hardware can run.
Quantization Options
| Quantization | Bits | File Size | VRAM Needed | RAM Needed | Quality |
|---|---|---|---|---|---|
| Q4_K_M | 4.5 | 85 GB | 88 GB | 96 GB | 85% |
Context window & KV cache
Adds 3.00 GB to VRAMLong chats and RAG inputs cost real memory. Drag to see how 32K vs 128K context shifts your grade.
Model native max: 64K tokens. KV-cache estimate is approximate (±30 %); real usage depends on attention layout.
How to run Mixtral 8x22B Instruct
Pick a runtime — copy & paste. Commands are pre-filled with this model’s repo.
GUI. Browse → download → chat. MLX on Apple Silicon.
LM Studio home →- 1
Open LM Studio
Go to the 🔍 Search tab.
- 2
Search for
MaziyarPanahi/Mixtral-8x22B-Instruct-v0.1-GGUF - 3
Download
Pick the Q4_K_M quant — best balance of size vs. quality.
- 4
Chat
Hit ▶ Load Model and start chatting. Toggle 'Local Server' to expose an OpenAI-compatible API on :1234.
Community benchmarks
Real tokens/sec reports from people running Mixtral 8x22B Instruct on actual hardware.
No community runs yet for this model. Be the first to submit your numbers.
See It In Action
Real model outputs generated via RunThisModel.com — watch responses stream in real time.
Outputs generated by real AI models via RunThisModel.com. Generation speed shown is from cloud inference. Local speeds vary by hardware — check your device.
Frequently Asked Questions
How much VRAM do I need to run Mixtral 8x22B Instruct?
Mixtral 8x22B Instruct requires 88GB VRAM minimum with Q4_K_M quantization. For full precision, you need 88GB VRAM.
What is the best quantization for Mixtral 8x22B Instruct?
Q4_K_M offers the best balance of quality and VRAM usage. Q8_0 is near-lossless if you have enough VRAM.