~/runthismodel
daemon okbuild 5a3c91d00:00:00Z
./models/browse/magnum-v4-72b
Anthracite · llm
Magnum v4 72B
Qwen2.5-72B fine-tuned on Claude-Opus-style literary data. Highest-quality long-form prose at the 72B class. Apache 2.0.
72b paramsqwen2apache-2.0128K ctx44.66144.5 GB vram
about·model card

Magnum v4 72B, authored by Anthracite, is a massive 72 billion parameter text generation model built on the qwen2 architecture. It excels in generating high-quality, coherent, and contextually rich text, making it an excellent choice for tasks that require deep understanding and nuanced responses. With a context length of 131072, Magnum v4 72B can handle extremely long sequences, which is particularly useful for applications like writing long-form content, summarizing extensive documents, or generating detailed narratives. The model is licensed under the Apache-2.0 license, ensuring it is freely available for both research and commercial use.

Despite its size, Magnum v4 72B offers good efficiency for its class, thanks to available quantizations like BF16 and Q4_K_M, which can significantly reduce the VRAM requirements. However, it still demands substantial hardware resources, with a VRAM range of 44.7–144.5 GB, making it more suitable for users with high-end GPUs or multi-GPU setups. This model is ideal for professionals, researchers, and organizations that need top-tier text generation capabilities and have the necessary hardware to support it. While it may not be the most practical choice for casual users or those with limited computational resources, it stands out for those who prioritize performance and can afford the hardware investment.

probe://hardware·which quants fit your rig
we auto-detect via WebGL/WebGPU. select manually if your GPU isn't recognized.
./quantizations·2 variants
QuantizationBitsFile SizeVRAM NeededRAM NeededQuality
BF1616144 GB144.5 GB145 GB
100%
Q4_K_M4.544.159 GB44.66 GB45.16 GB
85%

Context window & KV cache

Adds 2.50 GB to VRAM

Long chats and RAG inputs cost real memory. Drag to see how 32K vs 128K context shifts your grade.

Model native max: 128K tokens. KV-cache estimate is approximate (±30 %); real usage depends on attention layout.

How to run Magnum v4 72B

Pick a runtime — copy & paste. Commands are pre-filled with this model’s repo.

GUI. Browse → download → chat. MLX on Apple Silicon.

LM Studio home →
  1. 1

    Open LM Studio

    Go to the 🔍 Search tab.

  2. 2

    Search for

    bartowski/magnum-v4-72b-GGUF
  3. 3

    Download

    Pick the Q4_K_M quant — best balance of size vs. quality.

  4. 4

    Chat

    Hit ▶ Load Model and start chatting. Toggle 'Local Server' to expose an OpenAI-compatible API on :1234.

Community benchmarks

Real tokens/sec reports from people running Magnum v4 72B on actual hardware.

No community runs yet for this model. Be the first to submit your numbers.

Self-host serving plan

Want to host Magnum v4 72Bfor many users? Or run it on a card that’s technically too small? Slide the knobs.

VRAM needed

47.3 GB

44.7 GB weights + 2.1 GB KV

Aggregate tok/s

1

across 1 user

Per-user tok/s

1

72 B dense

⚠ Will spill 23.3 GB of weights to system RAM (~5× slower per offloaded layer). Use llama.cpp’s --cpu-offload-gb or vLLM’s --swap-space.

Throughput is a sub-linear estimate: doubling users adds ~70 % of single-user TPS until ~8, then plateaus on memory bandwidth. MoE models scale concurrency much better because each user activates a different subset of experts.

See It In Action

Real model outputs generated via RunThisModel.com — watch responses stream in real time.

Llama 3.3 70B responding...

Outputs generated by real AI models via RunThisModel.com. Generation speed shown is from cloud inference. Local speeds vary by hardware — check your device.

faq·common questions
how much VRAM do I need to run Magnum v4 72B?

Magnum v4 72B requires 44.66 GB VRAM minimum with BF16 quantization. For full precision you need 144.5 GB.

which quant should I pick?

Q4_K_M is the best quality/VRAM balance — ~92% of FP16 quality at ~25% the footprint. Q8_0 is near-lossless if you have the headroom.

faq://ai-curated·20 entries
What GPU do I need to run Magnum v4 72B?

To run Magnum v4 72B, you need a GPU with at least 44.7 GB of VRAM, depending on the quantization level. For optimal performance, a GPU with 144.5 GB of VRAM is recommended.

Is Magnum v4 72B good for coding?

Magnum v4 72B is primarily designed for generating high-quality long-form prose and may not be optimized for coding tasks. However, it can still provide useful assistance in natural language understanding and generation.

Magnum v4 72B vs Llama 3.1 8B?

Magnum v4 72B has 72 billion parameters, making it significantly larger and potentially more powerful than Llama 3.1 8B, which has 8 billion parameters. Magnum v4 72B is better suited for complex and detailed tasks.

Can I run Magnum v4 72B on a Mac?

Yes, you can run Magnum v4 72B on a Mac, but you will need a Mac with a compatible GPU that meets the VRAM requirements. Ensure your Mac has at least 44.7 GB of VRAM for the minimum configuration.

How much VRAM does Magnum v4 72B need?

Magnum v4 72B requires between 44.7 GB and 144.5 GB of VRAM, depending on the quantization level used. Higher quantization levels reduce the VRAM requirement but may impact performance.

Is Magnum v4 72B censored?

Magnum v4 72B is not inherently censored, but its behavior can be influenced by the data it was trained on and any post-training modifications. It is designed to generate high-quality, uncensored content.

Is Magnum v4 72B commercial-use allowed?

Yes, Magnum v4 72B is licensed under the Apache 2.0 license, which allows for commercial use as long as you comply with the terms of the license.

Magnum v4 72B context length?

Magnum v4 72B has a context length of 131,072 tokens, allowing it to handle very long sequences of text effectively.

Does Magnum v4 72B support function calling?

Magnum v4 72B does not natively support function calling, but you can integrate it with external tools or frameworks to achieve this functionality.

Magnum v4 72B quantization options?

Magnum v4 72B supports various quantization options, including 4-bit, 8-bit, and 16-bit quantization, which can reduce the VRAM requirements and improve inference speed.

Can Magnum v4 72B run on CPU?

While Magnum v4 72B can technically run on a CPU, it is highly resource-intensive and will be extremely slow. A GPU is strongly recommended for practical use.

Magnum v4 72B fine-tuning?

Magnum v4 72B can be fine-tuned on custom datasets to improve its performance on specific tasks. Fine-tuning requires significant computational resources and expertise.

Magnum v4 72B system requirements?

To run Magnum v4 72B, you need a system with at least 44.7 GB of VRAM, a powerful CPU, and sufficient RAM. A high-end GPU with 144.5 GB of VRAM is recommended for optimal performance.

Magnum v4 72B performance benchmark?

Performance benchmarks for Magnum v4 72B vary based on hardware, but it generally processes around 100-200 tokens per second on a high-end GPU. Lower-end GPUs will have slower performance.

Magnum v4 72B for RAG?

Magnum v4 72B can be used for Retrieval-Augmented Generation (RAG) tasks, where it retrieves relevant information from a database and generates text based on that information. This can enhance its contextual understanding and output quality.

Magnum v4 72B for agents?

Magnum v4 72B can be integrated into agent systems to provide advanced natural language processing capabilities. Its large context length and high-quality prose generation make it suitable for complex conversational agents.

Magnum v4 72B for coding vs general?

Magnum v4 72B is more suited for general natural language tasks and generating high-quality prose. While it can assist with coding-related tasks, specialized models like Codex are better optimized for coding-specific tasks.

Magnum v4 72B vs ChatGPT?

Magnum v4 72B is a larger model with 72 billion parameters, offering more detailed and nuanced responses compared to ChatGPT, which has fewer parameters. Magnum v4 72B is better suited for complex and long-form text generation.

Magnum v4 72B download size?

The download size of Magnum v4 72B varies depending on the quantization level. The full model without quantization is approximately 144 GB, while quantized versions can be significantly smaller.

Best quant for Magnum v4 72B?

The best quantization for Magnum v4 72B depends on your specific needs. 8-bit quantization offers a good balance between performance and VRAM usage, while 4-bit quantization further reduces VRAM requirements but may impact accuracy.