Can RTX 4070 SUPER run Dolphin 3.0 R1 Mistral 24B?

Yes — runs locally

~0 tok/sec · Cannot run — model too large for this GPU

Your VRAM

12 GB

Model size

24B

Best quant

Q4_K_M

VRAM needed

13.8 GB

The verdict

The RTX 4070 SUPER (12 GB VRAM) handles Dolphin 3.0 R1 Mistral 24B comfortably using the Q4_K_M quantization, which fits in 13.8 GB. Expected throughput is around 0 tokens/second, which feels Cannot run — model too large for this GPU in interactive use. Only widely-available uncensored R1-style reasoning model. Mistral-Small-24B base with chain-of-thought training and refusals removed. 128K context.

How to run it

1. Install Ollama or LM Studio.
2. Pull the Q4_K_M GGUF — best balance of quality and speed on 12 GB.
3. Start chatting. Expect ~0 tok/sec on first-token, faster after warmup.

See full Dolphin 3.0 R1 Mistral 24B setup →

Other models that run great on RTX 4070 SUPER

FAQ (20)

What GPU do I need to run Dolphin 3.0 R1 Mistral 24B?

To run Dolphin 3.0 R1 Mistral 24B, you need a GPU with at least 13.8 GB of VRAM, but 48.5 GB is recommended for optimal performance, especially with higher quantization levels.

Is Dolphin 3.0 R1 Mistral 24B good for coding?

Dolphin 3.0 R1 Mistral 24B is well-suited for coding tasks due to its large context length of 131,072 tokens and robust chain-of-thought training, making it effective for understanding complex codebases and generating high-quality code.

Dolphin 3.0 R1 Mistral 24B vs Llama 3.1 8B?

Dolphin 3.0 R1 Mistral 24B has more parameters (24B vs 8B) and a longer context length (131,072 vs typically shorter), which generally results in better performance for complex tasks, though it requires more VRAM and computational resources.

Can I run Dolphin 3.0 R1 Mistral 24B on a Mac?

Yes, you can run Dolphin 3.0 R1 Mistral 24B on a Mac, provided your Mac has a compatible GPU with sufficient VRAM (at least 13.8 GB). Ensure you have the necessary drivers and software installed.

How much VRAM does Dolphin 3.0 R1 Mistral 24B need?

Dolphin 3.0 R1 Mistral 24B requires between 13.8 GB and 48.5 GB of VRAM, depending on the quantization level used. Higher quantization levels reduce VRAM usage but may impact performance.

Is Dolphin 3.0 R1 Mistral 24B censored?

No, Dolphin 3.0 R1 Mistral 24B is an uncensored model, designed to provide open and unrestricted responses without content filters or refusals.

Is Dolphin 3.0 R1 Mistral 24B commercial-use allowed?

Yes, Dolphin 3.0 R1 Mistral 24B is licensed under the Apache-2.0 license, which allows for both commercial and non-commercial use, provided you comply with the terms of the license.

Dolphin 3.0 R1 Mistral 24B context length?

Dolphin 3.0 R1 Mistral 24B has a context length of 131,072 tokens, which is significantly larger than many other models, allowing it to handle very long inputs and maintain context over extensive conversations or documents.

Want personalized recommendations for your exact setup? Detect my hardware →