Codestral 22B (abliterated) by failspy is a large-scale, 22 billion parameter AI model designed for code generation. Built on the Mistral architecture, this model excels in generating high-quality, contextually relevant code snippets and can handle complex programming tasks with a context length of up to 32,768 tokens. This makes it particularly useful for developers and teams working on large codebases who need assistance with writing, debugging, and optimizing code. The model supports quantization options like BF16 and Q4_K_M, which help in reducing the computational load without significantly compromising performance.
While Codestral 22B is a heavyweight in terms of parameters, it holds its own in its size class, offering good efficiency and performance. It punches above its weight by delivering robust code generation capabilities that rival those of similarly sized models. However, the VRAM requirements range from 12.9 to 44.5 GB, making it more suitable for users with powerful GPUs. Developers and organizations with access to high-end hardware will benefit the most from this model, as it can significantly enhance productivity and code quality. For those with more modest hardware, smaller models might be a better fit, but for those who can handle the VRAM demands, Codestral 22B is a strong choice for advanced code generation tasks.
| Quantization | Bits | File Size | VRAM Needed | RAM Needed | Quality |
|---|---|---|---|---|---|
| BF16 | 16 | 44 GB | 44.5 GB | 45 GB | 100% |
| Q4_K_M | 4.5 | 12.425 GB | 12.93 GB | 13.43 GB | 85% |
Context window & KV cache
Adds 1.50 GB to VRAMLong chats and RAG inputs cost real memory. Drag to see how 32K vs 128K context shifts your grade.
Model native max: 32K tokens. KV-cache estimate is approximate (±30 %); real usage depends on attention layout.
How to run Codestral 22B (abliterated)
Pick a runtime — copy & paste. Commands are pre-filled with this model’s repo.
GUI. Browse → download → chat. MLX on Apple Silicon.
LM Studio home →- 1
Open LM Studio
Go to the 🔍 Search tab.
- 2
Search for
bartowski/Codestral-22B-v0.1-abliterated-v3-GGUF - 3
Download
Pick the Q4_K_M quant — best balance of size vs. quality.
- 4
Chat
Hit ▶ Load Model and start chatting. Toggle 'Local Server' to expose an OpenAI-compatible API on :1234.
Community benchmarks
Real tokens/sec reports from people running Codestral 22B (abliterated) on actual hardware.
No community runs yet for this model. Be the first to submit your numbers.
Self-host serving plan
Want to host Codestral 22B (abliterated)for many users? Or run it on a card that’s technically too small? Slide the knobs.
VRAM needed
14.6 GB
12.9 GB weights + 1.2 GB KV
Aggregate tok/s
11
across 1 user
Per-user tok/s
11
22 B dense
✅ Fits in 24 GB VRAM with 9.4 GB headroom. Pure-GPU inference — full speed.
Throughput is a sub-linear estimate: doubling users adds ~70 % of single-user TPS until ~8, then plateaus on memory bandwidth. MoE models scale concurrency much better because each user activates a different subset of experts.
See It In Action
Real model outputs generated via RunThisModel.com — watch responses stream in real time.
Outputs generated by real AI models via RunThisModel.com. Generation speed shown is from cloud inference. Local speeds vary by hardware — check your device.
how much VRAM do I need to run Codestral 22B (abliterated)?
Codestral 22B (abliterated) requires 12.93 GB VRAM minimum with BF16 quantization. For full precision you need 44.5 GB.
which quant should I pick?
Q4_K_M is the best quality/VRAM balance — ~92% of FP16 quality at ~25% the footprint. Q8_0 is near-lossless if you have the headroom.
What GPU do I need to run Codestral 22B (abliterated)?
To run Codestral 22B (abliterated), you will need a GPU with at least 12.9 GB of VRAM for the lowest quantization level, up to 44.5 GB for the highest precision.
Is Codestral 22B (abliterated) good for coding?
Yes, Codestral 22B (abliterated) is specialized for coding tasks and can provide high-quality code generation and assistance without the 'I can't help with that' filter.
Codestral 22B (abliterated) vs Llama 3.1 8B?
Codestral 22B (abliterated) has 22 billion parameters, making it significantly larger than Llama 3.1 8B, which may result in better performance for complex tasks but requires more VRAM.
Can I run Codestral 22B (abliterated) on a Mac?
Yes, you can run Codestral 22B (abliterated) on a Mac, provided your Mac has a compatible GPU with sufficient VRAM to handle the model's requirements.
How much VRAM does Codestral 22B (abliterated) need?
Codestral 22B (abliterated) requires between 12.9 GB and 44.5 GB of VRAM, depending on the quantization level used.
Is Codestral 22B (abliterated) censored?
No, Codestral 22B (abliterated) has had its refusal direction ablated, meaning it does not include the 'I can't help with that' filter and is less likely to refuse requests.
Is Codestral 22B (abliterated) commercial-use allowed?
No, Codestral 22B (abliterated) operates under a non-commercial license, which means it cannot be used for commercial purposes.
Codestral 22B (abliterated) context length?
Codestral 22B (abliterated) supports a context length of 32,768 tokens, allowing for handling very long sequences of text.
Does Codestral 22B (abliterated) support function calling?
Codestral 22B (abliterated) does not natively support function calling; however, you can implement custom solutions to achieve similar functionality.
Codestral 22B (abliterated) quantization options?
Codestral 22B (abliterated) offers multiple quantization options, including 4-bit, 8-bit, and 16-bit, each affecting the required VRAM and performance differently.
Can Codestral 22B (abliterated) run on CPU?
While Codestral 22B (abliterated) can technically run on a CPU, it is highly inefficient and not recommended due to the large number of parameters and the computational demands.
Codestral 22B (abliterated) fine-tuning?
Codestral 22B (abliterated) can be fine-tuned on your own data to improve performance on specific tasks, but this requires significant computational resources and expertise.
Codestral 22B (abliterated) system requirements?
To run Codestral 22B (abliterated), you need a system with a powerful GPU (12.9 GB to 44.5 GB VRAM), at least 64 GB of RAM, and a multi-core CPU.
Codestral 22B (abliterated) performance benchmark?
Performance benchmarks for Codestral 22B (abliterated) vary based on hardware, but typical throughput ranges from 10 to 50 tokens per second on high-end GPUs.
Codestral 22B (abliterated) for RAG?
Codestral 22B (abliterated) can be used for Retrieval-Augmented Generation (RAG) tasks, but you will need to set up an external retrieval system to fetch relevant documents.
Codestral 22B (abliterated) for agents?
Codestral 22B (abliterated) can be integrated into agent systems to provide advanced natural language processing capabilities, especially for coding-related tasks.
Codestral 22B (abliterated) for coding vs general?
Codestral 22B (abliterated) is specifically optimized for coding tasks, which may make it less suitable for general-purpose NLP tasks compared to more versatile models.
Codestral 22B (abliterated) vs ChatGPT?
Codestral 22B (abliterated) is designed for coding and has a non-commercial license, while ChatGPT is a more general-purpose model with different licensing terms and potentially different performance characteristics.
Codestral 22B (abliterated) download size?
The download size of Codestral 22B (abliterated) varies depending on the quantization level, ranging from approximately 10 GB for 4-bit quantization to 44.5 GB for full precision.
Best quant for Codestral 22B (abliterated)?
The best quantization level for Codestral 22B (abliterated) depends on your hardware and performance needs. 8-bit quantization offers a good balance between VRAM usage and performance for most users.