~/runthismodel
daemon okbuild 5a3c91d00:00:00Z
./models/browse/rocket-3b
Pansophic · llm
Rocket 3B
Fast 3B model tuned for helpful responses.
3b paramsstablelmother4K ctx2.093.27 GB vram
about·model card

Rocket 3B by Pansophic is a 3 billion parameter language model designed for efficient local deployment, particularly excelling in text generation tasks. It leverages the stablelm architecture to produce coherent and contextually relevant outputs, making it suitable for applications such as content creation, chatbots, and summarization. With a context length of 4096 tokens, Rocket 3B can handle longer inputs and generate more detailed responses compared to smaller models, which is beneficial for complex or nuanced tasks.

In its size class, Rocket 3B stands out for its balance between performance and resource efficiency. While it may not match the cutting-edge capabilities of larger models like those with 10 billion parameters or more, it offers a compelling alternative for users who need robust text generation without the high computational demands. The model’s quantization options, including Q4_K_M and Q8_0, further enhance its efficiency, allowing it to run smoothly on hardware with as little as 2.1 GB of VRAM. This makes it an excellent choice for developers and enthusiasts working with mid-range GPUs or systems with limited resources. Ideal users include those looking to deploy a capable language model for personal projects, small-scale applications, or educational purposes, where the trade-off between performance and efficiency is crucial.

probe://hardware·which quants fit your rig
we auto-detect via WebGL/WebGPU. select manually if your GPU isn't recognized.
./quantizations·2 variants
QuantizationBitsFile SizeVRAM NeededRAM NeededQuality
Q4_K_M4.51.591 GB2.09 GB2.59 GB
85%
Q8_082.769 GB3.27 GB3.77 GB
98%

Context window & KV cache

Adds 0.33 GB to VRAM

Long chats and RAG inputs cost real memory. Drag to see how 32K vs 128K context shifts your grade.

Model native max: 4K tokens. KV-cache estimate is approximate (±30 %); real usage depends on attention layout.

How to run Rocket 3B

Pick a runtime — copy & paste. Commands are pre-filled with this model’s repo.

GUI. Browse → download → chat. MLX on Apple Silicon.

LM Studio home →
  1. 1

    Open LM Studio

    Go to the 🔍 Search tab.

  2. 2

    Search for

    TheBloke/rocket-3B-GGUF
  3. 3

    Download

    Pick the Q4_K_M quant — best balance of size vs. quality.

  4. 4

    Chat

    Hit ▶ Load Model and start chatting. Toggle 'Local Server' to expose an OpenAI-compatible API on :1234.

Community benchmarks

Real tokens/sec reports from people running Rocket 3B on actual hardware.

No community runs yet for this model. Be the first to submit your numbers.

Self-host serving plan

Want to host Rocket 3Bfor many users? Or run it on a card that’s technically too small? Slide the knobs.

VRAM needed

3.0 GB

2.1 GB weights + 0.4 GB KV

Aggregate tok/s

83

across 1 user

Per-user tok/s

83

3 B dense

✅ Fits in 24 GB VRAM with 21.0 GB headroom. Pure-GPU inference — full speed.

Throughput is a sub-linear estimate: doubling users adds ~70 % of single-user TPS until ~8, then plateaus on memory bandwidth. MoE models scale concurrency much better because each user activates a different subset of experts.

See It In Action

Real model outputs generated via RunThisModel.com — watch responses stream in real time.

Llama 3.3 70B responding...

Outputs generated by real AI models via RunThisModel.com. Generation speed shown is from cloud inference. Local speeds vary by hardware — check your device.

faq·common questions
how much VRAM do I need to run Rocket 3B?

Rocket 3B requires 2.09 GB VRAM minimum with Q4_K_M quantization. For full precision you need 3.27 GB.

which quant should I pick?

Q4_K_M is the best quality/VRAM balance — ~92% of FP16 quality at ~25% the footprint. Q8_0 is near-lossless if you have the headroom.

faq://ai-curated·20 entries
What GPU do I need to run Rocket 3B?

To run Rocket 3B, you need a GPU with at least 2.1 GB of VRAM for the lowest quantization level, but 3.3 GB is recommended for better performance.

Is Rocket 3B good for coding?

Rocket 3B is well-suited for coding tasks due to its fast response times and context length of 4096 tokens, making it effective for code completion and documentation.

Rocket 3B vs Llama 3.1 8B?

Rocket 3B has fewer parameters (3B vs 8B) but is optimized for speed and efficiency, making it a better choice for resource-constrained environments. Llama 3.1 8B may offer more detailed responses but requires more VRAM.

Can I run Rocket 3B on a Mac?

Yes, Rocket 3B can run on a Mac with an M1 or M2 chip, provided you have the necessary VRAM and system resources.

How much VRAM does Rocket 3B need?

Rocket 3B requires between 2.1 GB and 3.3 GB of VRAM, depending on the quantization level used.

Is Rocket 3B censored?

Rocket 3B is not inherently censored, but its responses are designed to be helpful and appropriate. The model adheres to ethical guidelines to avoid harmful content.

Is Rocket 3B commercial-use allowed?

Rocket 3B is licensed under a non-standard license, so you should review the specific terms to ensure it meets your commercial use requirements.

Rocket 3B context length?

Rocket 3B supports a context length of 4096 tokens, which is sufficient for most conversational and text generation tasks.

Does Rocket 3B support function calling?

Rocket 3B does not natively support function calling, but you can integrate it with external tools and APIs for extended functionality.

Rocket 3B quantization options?

Rocket 3B supports multiple quantization levels, including INT8 and INT4, which reduce the VRAM requirements and improve performance.

Can Rocket 3B run on CPU?

While Rocket 3B can run on a CPU, it will be significantly slower compared to running on a GPU. A powerful multi-core CPU is recommended for acceptable performance.

Rocket 3B fine-tuning?

Rocket 3B can be fine-tuned using frameworks like Hugging Face Transformers. Fine-tuning allows you to adapt the model to specific domains or tasks.

Rocket 3B system requirements?

To run Rocket 3B, you need a system with at least 8 GB of RAM, a compatible GPU with 2.1-3.3 GB VRAM, and a modern CPU. Additional storage is required for model files.

Rocket 3B performance benchmark?

Rocket 3B typically processes around 50-100 tokens per second on a mid-range GPU, with performance varying based on the specific hardware and quantization level used.

Rocket 3B for RAG?

Rocket 3B can be used for Retrieval-Augmented Generation (RAG) by integrating it with a retrieval system to enhance its context and provide more accurate responses.

Rocket 3B for agents?

Rocket 3B is suitable for creating conversational agents due to its fast response times and ability to handle long contexts, making it ideal for chatbots and virtual assistants.

Rocket 3B for coding vs general?

Rocket 3B performs well in both coding and general text generation tasks. For coding, its context length and speed are particularly beneficial, while for general tasks, its helpful responses and versatility shine.

Rocket 3B vs ChatGPT?

Rocket 3B is smaller and faster than ChatGPT, making it more suitable for local deployment and resource-constrained environments. ChatGPT, with more parameters, may offer more nuanced responses but requires more computational power.

Rocket 3B download size?

The download size of Rocket 3B varies depending on the quantization level, ranging from approximately 1.5 GB to 3 GB.

Best quant for Rocket 3B?

The best quantization for Rocket 3B depends on your hardware. INT8 provides a good balance between performance and VRAM usage, while INT4 is more efficient but may slightly reduce accuracy.