IBM
Granite 3.0 3B-A800M
IBM enterprise-grade small MoE. 3.4 B total, 800 M active. Long context, function-calling.
About This Model
Granite 3.0 3B-A800M is the bigger Granite MoE. Still small enough for laptop / SBC inference, but the active-parameter count of 800 M gives it noticeably better instruction-following than the 1B-A400M sibling. IBM positions it for enterprise use cases — function calling, RAG, structured output.
Check Your Hardware
See which quantizations of Granite 3.0 3B-A800M your hardware can run.
Quantization Options
| Quantization | Bits | File Size | VRAM Needed | RAM Needed | Quality |
|---|---|---|---|---|---|
| Q4_K_M | 4.5 | 2.1 GB | 3 GB | 6 GB | 85% |
Context window & KV cache
Adds 0.33 GB to VRAMLong chats and RAG inputs cost real memory. Drag to see how 32K vs 128K context shifts your grade.
Model native max: 4K tokens. KV-cache estimate is approximate (±30 %); real usage depends on attention layout.
How to run Granite 3.0 3B-A800M
Pick a runtime — copy & paste. Commands are pre-filled with this model’s repo.
GUI. Browse → download → chat. MLX on Apple Silicon.
LM Studio home →- 1
Open LM Studio
Go to the 🔍 Search tab.
- 2
Search for
bartowski/granite-3.0-3b-a800m-instruct-GGUF - 3
Download
Pick the Q4_K_M quant — best balance of size vs. quality.
- 4
Chat
Hit ▶ Load Model and start chatting. Toggle 'Local Server' to expose an OpenAI-compatible API on :1234.
Community benchmarks
Real tokens/sec reports from people running Granite 3.0 3B-A800M on actual hardware.
No community runs yet for this model. Be the first to submit your numbers.
See It In Action
Real model outputs generated via RunThisModel.com — watch responses stream in real time.
Outputs generated by real AI models via RunThisModel.com. Generation speed shown is from cloud inference. Local speeds vary by hardware — check your device.
Frequently Asked Questions
How much VRAM do I need to run Granite 3.0 3B-A800M?
Granite 3.0 3B-A800M requires 3GB VRAM minimum with Q4_K_M quantization. For full precision, you need 3GB VRAM.
What is the best quantization for Granite 3.0 3B-A800M?
Q4_K_M offers the best balance of quality and VRAM usage. Q8_0 is near-lossless if you have enough VRAM.