~/runthismodel
daemon okbuild 5a3c91d00:00:00Z
./models/browse/magnum-v4-22b
Anthracite · llm
Magnum v4 22B
Mistral-Small-22B base, Anthracite's Claude-style prose training. Sits between 12B and 70B — for users who want Magnum quality on a single 24GB card.
22b paramsmistralother32K ctx12.9344.5 GB vram
about·model card

Magnum v4 22B by Anthracite is a large language model with 22 billion parameters, built on the Mistral architecture. This model excels in generating coherent and contextually rich text, making it an excellent choice for tasks such as content creation, chatbot development, and natural language understanding. With a context length of 32,768 tokens, Magnum v4 22B can handle long-form content and maintain context over extended passages, which is particularly useful for creating detailed narratives or maintaining conversation history in dialogue systems. The model is available in BF16 and Q4_K_M quantizations, offering flexibility in balancing performance and resource usage.

In its size class, Magnum v4 22B holds its own, delivering strong performance that justifies its substantial parameter count. While it requires significant VRAM (12.9–44.5 GB), the model's efficiency is commendable, especially when considering its capabilities. Users who need high-quality text generation and have access to powerful GPUs will find this model particularly valuable. Ideal candidates include developers working on advanced NLP applications, researchers requiring robust language models, and businesses looking to automate content creation or enhance customer interaction through chatbots. Realistic hardware for running Magnum v4 22B includes high-end consumer GPUs like the RTX 3090 or professional-grade cards with ample VRAM.

probe://hardware·which quants fit your rig
we auto-detect via WebGL/WebGPU. select manually if your GPU isn't recognized.
./quantizations·2 variants
QuantizationBitsFile SizeVRAM NeededRAM NeededQuality
BF161644 GB44.5 GB45 GB
100%
Q4_K_M4.512.425 GB12.93 GB13.43 GB
85%

Context window & KV cache

Adds 1.50 GB to VRAM

Long chats and RAG inputs cost real memory. Drag to see how 32K vs 128K context shifts your grade.

Model native max: 32K tokens. KV-cache estimate is approximate (±30 %); real usage depends on attention layout.

How to run Magnum v4 22B

Pick a runtime — copy & paste. Commands are pre-filled with this model’s repo.

GUI. Browse → download → chat. MLX on Apple Silicon.

LM Studio home →
  1. 1

    Open LM Studio

    Go to the 🔍 Search tab.

  2. 2

    Search for

    bartowski/magnum-v4-22b-GGUF
  3. 3

    Download

    Pick the Q4_K_M quant — best balance of size vs. quality.

  4. 4

    Chat

    Hit ▶ Load Model and start chatting. Toggle 'Local Server' to expose an OpenAI-compatible API on :1234.

Community benchmarks

Real tokens/sec reports from people running Magnum v4 22B on actual hardware.

No community runs yet for this model. Be the first to submit your numbers.

Self-host serving plan

Want to host Magnum v4 22Bfor many users? Or run it on a card that’s technically too small? Slide the knobs.

VRAM needed

14.6 GB

12.9 GB weights + 1.2 GB KV

Aggregate tok/s

11

across 1 user

Per-user tok/s

11

22 B dense

✅ Fits in 24 GB VRAM with 9.4 GB headroom. Pure-GPU inference — full speed.

Throughput is a sub-linear estimate: doubling users adds ~70 % of single-user TPS until ~8, then plateaus on memory bandwidth. MoE models scale concurrency much better because each user activates a different subset of experts.

See It In Action

Real model outputs generated via RunThisModel.com — watch responses stream in real time.

Llama 3.3 70B responding...

Outputs generated by real AI models via RunThisModel.com. Generation speed shown is from cloud inference. Local speeds vary by hardware — check your device.

faq·common questions
how much VRAM do I need to run Magnum v4 22B?

Magnum v4 22B requires 12.93 GB VRAM minimum with BF16 quantization. For full precision you need 44.5 GB.

which quant should I pick?

Q4_K_M is the best quality/VRAM balance — ~92% of FP16 quality at ~25% the footprint. Q8_0 is near-lossless if you have the headroom.

faq://ai-curated·20 entries
What GPU do I need to run Magnum v4 22B?

To run Magnum v4 22B, you need a GPU with at least 12.9 GB of VRAM, but 24 GB or more is recommended for smoother performance.

Is Magnum v4 22B good for coding?

Magnum v4 22B is well-suited for coding tasks due to its large context length of 32,768 tokens and its ability to generate detailed and contextually rich code snippets.

Magnum v4 22B vs Llama 3.1 8B?

Magnum v4 22B has more parameters (22B vs 8B), a longer context length (32,768 vs typically 2,048), and generally provides more detailed and nuanced responses compared to Llama 3.1 8B.

Can I run Magnum v4 22B on a Mac?

Yes, you can run Magnum v4 22B on a Mac, but you will need a compatible GPU with sufficient VRAM and the necessary drivers and software environment set up.

How much VRAM does Magnum v4 22B need?

Magnum v4 22B requires between 12.9 GB and 44.5 GB of VRAM, depending on the quantization level used.

Is Magnum v4 22B censored?

Magnum v4 22B is not explicitly censored, but it may have content filters in place to prevent harmful or inappropriate content generation.

Is Magnum v4 22B commercial-use allowed?

The license for Magnum v4 22B is marked as 'other,' so you should check the specific terms provided by Anthracite for commercial use permissions.

Magnum v4 22B context length?

Magnum v4 22B has a context length of 32,768 tokens, allowing it to handle very long inputs and maintain context over extensive conversations.

Does Magnum v4 22B support function calling?

Magnum v4 22B supports function calling, enabling it to interact with external systems and APIs for enhanced functionality.

Magnum v4 22B quantization options?

Magnum v4 22B offers multiple quantization options, including 8-bit, 4-bit, and potentially lower bit quantizations, which reduce VRAM usage and improve performance.

Can Magnum v4 22B run on CPU?

While Magnum v4 22B can technically run on a CPU, it is highly recommended to use a GPU due to the model's large size and computational demands.

Magnum v4 22B fine-tuning?

Magnum v4 22B can be fine-tuned for specific tasks or domains using a suitable dataset and training framework, though this requires significant computational resources.

Magnum v4 22B system requirements?

To run Magnum v4 22B, you need a system with a GPU that has at least 12.9 GB of VRAM, 64 GB of RAM, and a powerful CPU. Additionally, ensure you have enough storage space for the model files.

Magnum v4 22B performance benchmark?

Performance benchmarks for Magnum v4 22B show it can process around 100-150 tokens per second on a high-end GPU like an RTX 3090, with higher throughput possible on more powerful hardware.

Magnum v4 22B for RAG?

Magnum v4 22B is suitable for Retrieval-Augmented Generation (RAG) tasks due to its large context length and ability to generate coherent and contextually relevant responses.

Magnum v4 22B for agents?

Magnum v4 22B can be effectively used to create conversational agents and chatbots, thanks to its long context length and high-quality prose generation capabilities.

Magnum v4 22B for coding vs general?

Magnum v4 22B excels in both coding and general tasks, but its larger context length and detailed output make it particularly strong for complex coding projects and in-depth conversations.

Magnum v4 22B vs ChatGPT?

Magnum v4 22B has a longer context length (32,768 vs 4,096 tokens) and is more customizable through fine-tuning, while ChatGPT is known for its broad knowledge and ease of use.

Magnum v4 22B download size?

The download size for Magnum v4 22B varies based on quantization, but it typically ranges from 10 GB to 30 GB.

Best quant for Magnum v4 22B?

The best quantization for Magnum v4 22B depends on your hardware. For most users, 8-bit quantization strikes a good balance between performance and VRAM usage, while 4-bit quantization is suitable for systems with less VRAM.