Dolphin 3.0 R1 Mistral 24B is a large language model (LLM) developed by Cognitive Computations, boasting 24 billion parameters and a context length of 131,072 tokens. This model excels in text generation tasks, including but not limited to, creative writing, summarization, and conversational AI. Its extensive context length allows it to maintain coherence over longer passages, making it particularly useful for generating detailed narratives or technical documents. The model is licensed under Apache-2.0, which makes it accessible for both commercial and non-commercial projects.
In its size class, Dolphin 3.0 R1 Mistral 24B holds its own, offering a balance between performance and efficiency. While it may not outperform the top-tier models in every benchmark, its ability to handle long contexts and generate high-quality text makes it a strong contender. The available quantizations (BF16 and Q4_K_M) enhance its efficiency, allowing it to run on a variety of hardware setups with VRAM ranging from 13.8 GB to 48.5 GB. This flexibility means it can be deployed on mid-range GPUs as well as more powerful systems, making it a practical choice for developers and researchers who need robust text generation capabilities without requiring the most cutting-edge hardware. Ideal users include those working on content creation, chatbots, and any application where nuanced, coherent text output is crucial.
| Quantization | Bits | File Size | VRAM Needed | RAM Needed | Quality |
|---|---|---|---|---|---|
| BF16 | 16 | 48 GB | 48.5 GB | 49 GB | 100% |
| Q4_K_M | 4.5 | 13.35 GB | 13.85 GB | 14.35 GB | 85% |
Context window & KV cache
Adds 1.50 GB to VRAMLong chats and RAG inputs cost real memory. Drag to see how 32K vs 128K context shifts your grade.
Model native max: 128K tokens. KV-cache estimate is approximate (±30 %); real usage depends on attention layout.
How to run Dolphin 3.0 R1 Mistral 24B
Pick a runtime — copy & paste. Commands are pre-filled with this model’s repo.
GUI. Browse → download → chat. MLX on Apple Silicon.
LM Studio home →- 1
Open LM Studio
Go to the 🔍 Search tab.
- 2
Search for
mradermacher/Dolphin3.0-R1-Mistral-24B-GGUF - 3
Download
Pick the Q4_K_M quant — best balance of size vs. quality.
- 4
Chat
Hit ▶ Load Model and start chatting. Toggle 'Local Server' to expose an OpenAI-compatible API on :1234.
Community benchmarks
Real tokens/sec reports from people running Dolphin 3.0 R1 Mistral 24B on actual hardware.
No community runs yet for this model. Be the first to submit your numbers.
Self-host serving plan
Want to host Dolphin 3.0 R1 Mistral 24Bfor many users? Or run it on a card that’s technically too small? Slide the knobs.
VRAM needed
15.6 GB
13.8 GB weights + 1.2 GB KV
Aggregate tok/s
10
across 1 user
Per-user tok/s
10
24 B dense
✅ Fits in 24 GB VRAM with 8.4 GB headroom. Pure-GPU inference — full speed.
Throughput is a sub-linear estimate: doubling users adds ~70 % of single-user TPS until ~8, then plateaus on memory bandwidth. MoE models scale concurrency much better because each user activates a different subset of experts.
See It In Action
Real model outputs generated via RunThisModel.com — watch responses stream in real time.
Outputs generated by real AI models via RunThisModel.com. Generation speed shown is from cloud inference. Local speeds vary by hardware — check your device.
how much VRAM do I need to run Dolphin 3.0 R1 Mistral 24B?
Dolphin 3.0 R1 Mistral 24B requires 13.85 GB VRAM minimum with BF16 quantization. For full precision you need 48.5 GB.
which quant should I pick?
Q4_K_M is the best quality/VRAM balance — ~92% of FP16 quality at ~25% the footprint. Q8_0 is near-lossless if you have the headroom.
What GPU do I need to run Dolphin 3.0 R1 Mistral 24B?
To run Dolphin 3.0 R1 Mistral 24B, you need a GPU with at least 13.8 GB of VRAM, but 48.5 GB is recommended for optimal performance, especially with higher quantization levels.
Is Dolphin 3.0 R1 Mistral 24B good for coding?
Dolphin 3.0 R1 Mistral 24B is well-suited for coding tasks due to its large context length of 131,072 tokens and robust chain-of-thought training, making it effective for understanding complex codebases and generating high-quality code.
Dolphin 3.0 R1 Mistral 24B vs Llama 3.1 8B?
Dolphin 3.0 R1 Mistral 24B has more parameters (24B vs 8B) and a longer context length (131,072 vs typically shorter), which generally results in better performance for complex tasks, though it requires more VRAM and computational resources.
Can I run Dolphin 3.0 R1 Mistral 24B on a Mac?
Yes, you can run Dolphin 3.0 R1 Mistral 24B on a Mac, provided your Mac has a compatible GPU with sufficient VRAM (at least 13.8 GB). Ensure you have the necessary drivers and software installed.
How much VRAM does Dolphin 3.0 R1 Mistral 24B need?
Dolphin 3.0 R1 Mistral 24B requires between 13.8 GB and 48.5 GB of VRAM, depending on the quantization level used. Higher quantization levels reduce VRAM usage but may impact performance.
Is Dolphin 3.0 R1 Mistral 24B censored?
No, Dolphin 3.0 R1 Mistral 24B is an uncensored model, designed to provide open and unrestricted responses without content filters or refusals.
Is Dolphin 3.0 R1 Mistral 24B commercial-use allowed?
Yes, Dolphin 3.0 R1 Mistral 24B is licensed under the Apache-2.0 license, which allows for both commercial and non-commercial use, provided you comply with the terms of the license.
Dolphin 3.0 R1 Mistral 24B context length?
Dolphin 3.0 R1 Mistral 24B has a context length of 131,072 tokens, which is significantly larger than many other models, allowing it to handle very long inputs and maintain context over extensive conversations or documents.
Does Dolphin 3.0 R1 Mistral 24B support function calling?
Dolphin 3.0 R1 Mistral 24B supports function calling, enabling it to interact with external systems and APIs, enhancing its capabilities for complex applications and integrations.
Dolphin 3.0 R1 Mistral 24B quantization options?
Dolphin 3.0 R1 Mistral 24B supports various quantization options, including 4-bit, 8-bit, and 16-bit, which can reduce VRAM usage and improve inference speed, though with potential trade-offs in accuracy.
Can Dolphin 3.0 R1 Mistral 24B run on CPU?
While Dolphin 3.0 R1 Mistral 24B can technically run on a CPU, it is highly resource-intensive and will be significantly slower compared to running on a GPU. For practical use, a GPU is strongly recommended.
Dolphin 3.0 R1 Mistral 24B fine-tuning?
Dolphin 3.0 R1 Mistral 24B can be fine-tuned on your own data to improve its performance on specific tasks or domains. Fine-tuning requires a powerful GPU and sufficient memory to handle the large model size.
Dolphin 3.0 R1 Mistral 24B system requirements?
To run Dolphin 3.0 R1 Mistral 24B, you need a system with a GPU that has at least 13.8 GB of VRAM, 64 GB of RAM, and a multi-core CPU. Additionally, ensure you have a stable internet connection and sufficient storage space.
Dolphin 3.0 R1 Mistral 24B performance benchmark?
Performance benchmarks for Dolphin 3.0 R1 Mistral 24B show it can process around 100-200 tokens per second on a high-end GPU like the RTX 3090, with lower throughput on less powerful hardware.
Dolphin 3.0 R1 Mistral 24B for RAG?
Dolphin 3.0 R1 Mistral 24B is suitable for Retrieval-Augmented Generation (RAG) tasks due to its large context length and ability to integrate external information effectively.
Dolphin 3.0 R1 Mistral 24B for agents?
Dolphin 3.0 R1 Mistral 24B can be used to create intelligent agents due to its robust reasoning capabilities and support for function calling, making it ideal for applications requiring dynamic interaction and decision-making.
Dolphin 3.0 R1 Mistral 24B for coding vs general?
Dolphin 3.0 R1 Mistral 24B performs well in both coding and general tasks, but its extensive context length and chain-of-thought training make it particularly strong for coding, where understanding and generating complex code is crucial.
Dolphin 3.0 R1 Mistral 24B vs ChatGPT?
Dolphin 3.0 R1 Mistral 24B offers a larger context length (131,072 tokens vs ChatGPT's 4,096 tokens) and is uncensored, making it more suitable for tasks requiring extensive context and unrestricted responses. However, ChatGPT may have better fine-tuned performance for specific use cases.
Dolphin 3.0 R1 Mistral 24B download size?
The download size for Dolphin 3.0 R1 Mistral 24B varies depending on the quantization level, ranging from approximately 12 GB (4-bit) to 48 GB (16-bit).
Best quant for Dolphin 3.0 R1 Mistral 24B?
The best quantization level for Dolphin 3.0 R1 Mistral 24B depends on your specific needs. 8-bit quantization offers a good balance between performance and VRAM usage, while 4-bit is more efficient but may slightly reduce accuracy.