Mistral Small 22B is a large language model developed by Mistral AI, designed for text generation tasks. With 22 billion parameters, it offers a balance between performance and resource requirements, making it a versatile choice for generating coherent and contextually relevant text. The model excels in tasks such as content creation, summarization, and conversational AI, thanks to its impressive context length of 32,768 tokens, which allows it to maintain and understand long sequences of text. This makes it particularly useful for applications where deep context is crucial, such as writing detailed articles or generating complex narratives.
Compared to other models in its size class, Mistral Small 22B holds its own, offering competitive performance with relatively efficient resource usage. It requires around 12.9 GB of VRAM, which is manageable for modern GPUs, making it a practical choice for users who want the benefits of a large language model without the need for high-end hardware. While it may not outperform the largest models in every scenario, its efficiency and strong performance make it a solid choice for a wide range of text generation tasks. Ideal users include developers, content creators, and researchers who need a powerful yet accessible LLM for local deployment. Realistic hardware options include mid-range to high-end consumer GPUs, ensuring that a broad audience can leverage its capabilities.
| Quantization | Bits | File Size | VRAM Needed | RAM Needed | Quality |
|---|---|---|---|---|---|
| Q4_K_M | 4.5 | 12.425 GB | 12.93 GB | 13.43 GB | 85% |
Context window & KV cache
Adds 1.50 GB to VRAMLong chats and RAG inputs cost real memory. Drag to see how 32K vs 128K context shifts your grade.
Model native max: 32K tokens. KV-cache estimate is approximate (±30 %); real usage depends on attention layout.
How to run Mistral Small 22B
Pick a runtime — copy & paste. Commands are pre-filled with this model’s repo.
Easiest. Single command. OpenAI-compatible API on :11434.
Ollama home →- 1
Pull the model
ollama pull mistral-small - 2
Chat
ollama run mistral-small - 3
Use as API
curl http://localhost:11434/api/chat \ -d '{"model":"mistral-small","messages":[{"role":"user","content":"Hi"}]}'
Community benchmarks
Real tokens/sec reports from people running Mistral Small 22B on actual hardware.
No community runs yet for this model. Be the first to submit your numbers.
Self-host serving plan
Want to host Mistral Small 22Bfor many users? Or run it on a card that’s technically too small? Slide the knobs.
VRAM needed
14.6 GB
12.9 GB weights + 1.2 GB KV
Aggregate tok/s
11
across 1 user
Per-user tok/s
11
22 B dense
✅ Fits in 24 GB VRAM with 9.4 GB headroom. Pure-GPU inference — full speed.
Throughput is a sub-linear estimate: doubling users adds ~70 % of single-user TPS until ~8, then plateaus on memory bandwidth. MoE models scale concurrency much better because each user activates a different subset of experts.
See It In Action
Real model outputs generated via RunThisModel.com — watch responses stream in real time.
Outputs generated by real AI models via RunThisModel.com. Generation speed shown is from cloud inference. Local speeds vary by hardware — check your device.
how much VRAM do I need to run Mistral Small 22B?
Mistral Small 22B requires 12.93 GB VRAM minimum with Q4_K_M quantization. For full precision you need 12.93 GB.
which quant should I pick?
Q4_K_M is the best quality/VRAM balance — ~92% of FP16 quality at ~25% the footprint. Q8_0 is near-lossless if you have the headroom.
What GPU do I need to run Mistral Small 22B?
To run Mistral Small 22B, you need a GPU with at least 12.9 GB of VRAM, such as an NVIDIA RTX 3080 or higher.
Is Mistral Small 22B good for coding?
Mistral Small 22B is well-suited for coding tasks due to its strong reasoning capabilities and multilingual support, making it effective for code generation and documentation.
Mistral Small 22B vs Llama 3.1 8B?
Mistral Small 22B has more parameters (22B vs 8B), offering better performance in complex reasoning and multilingual tasks, but requires more VRAM and RAM.
Can I run Mistral Small 22B on a Mac?
Yes, you can run Mistral Small 22B on a Mac with a compatible GPU and sufficient VRAM, but ensure your system meets the minimum requirements of 16GB+ RAM and 12.9 GB VRAM.
How much VRAM does Mistral Small 22B need?
Mistral Small 22B requires at least 12.9 GB of VRAM, depending on the quantization level used.
Is Mistral Small 22B censored?
Mistral Small 22B is not inherently censored, but it may include content filters to prevent harmful or inappropriate output.
Is Mistral Small 22B commercial-use allowed?
Yes, Mistral Small 22B is licensed under Apache-2.0, allowing commercial use as long as you comply with the license terms.
Mistral Small 22B context length?
Mistral Small 22B supports a context length of up to 32,768 tokens, which is significantly longer than many other models.
Does Mistral Small 22B support function calling?
Mistral Small 22B does not natively support function calling, but you can implement custom solutions to handle function calls.
Mistral Small 22B quantization options?
Mistral Small 22B supports various quantization options, including 4-bit and 8-bit, to reduce VRAM usage and improve inference speed.
Can Mistral Small 22B run on CPU?
While Mistral Small 22B can technically run on a CPU, it is highly inefficient and slow due to the large number of parameters. A GPU is strongly recommended.
Mistral Small 22B fine-tuning?
Mistral Small 22B can be fine-tuned for specific tasks, but this requires significant computational resources and expertise.
Mistral Small 22B system requirements?
Mistral Small 22B requires a system with at least 16GB of RAM, 12.9 GB of VRAM, and a compatible GPU. Additional storage space will be needed for the model files.
Mistral Small 22B performance benchmark?
Performance benchmarks for Mistral Small 22B show it can process around 50-100 tokens per second on high-end GPUs like the RTX 3090, depending on the quantization level.
Mistral Small 22B for RAG?
Mistral Small 22B can be used for Retrieval-Augmented Generation (RAG) tasks, leveraging its strong reasoning and multilingual capabilities to enhance the quality of generated text.
Mistral Small 22B for agents?
Mistral Small 22B is suitable for creating conversational agents due to its robust language understanding and generation capabilities, especially in multilingual environments.
Mistral Small 22B for coding vs general?
Mistral Small 22B performs well in both coding and general tasks, but its strength in reasoning and multilingual support makes it particularly effective for coding and technical documentation.
Mistral Small 22B vs ChatGPT?
Mistral Small 22B offers similar capabilities to ChatGPT but with a focus on multilingual support and longer context lengths, making it a strong choice for diverse and complex tasks.
Mistral Small 22B download size?
The download size for Mistral Small 22B varies depending on the quantization level, typically ranging from 10GB to 20GB.
Best quant for Mistral Small 22B?
The best quantization for Mistral Small 22B depends on your hardware and use case. 4-bit quantization reduces VRAM usage while maintaining good performance, but 8-bit quantization offers a balance between efficiency and accuracy.