Best models for 8 GB VRAM

RTX 3060 / 3070 / M-series Mac with 16 GB unified memory

Curated picks that comfortably fit on an 8 GB GPU. Each ships in a Q4_K_M quant that leaves headroom for a 4–8K context window. Sorted by quality-per-byte.

1
Alibaba
Qwen 2.5 7B Instruct
Efficient 7B model with strong coding and reasoning abilities.
7.6B≥ 5.3 GB
2
Meta
Llama 3.1 8B Instruct
Meta's 8B parameter instruction-tuned model. Great balance of performance and efficiency for local deployment.
8B≥ 5.08 GB
3
Google
Gemma 2 9B Instruct
Google's efficient 9B model. Great performance-to-size ratio.
9.2B≥ 5.87 GB
4
Mistral AI
Mistral 7B Instruct v0.3
Efficient 7B model from Mistral AI with strong performance for its size.
7.3B≥ 4.57 GB
5
Microsoft
Phi-3.5 Mini 3.8B
Tiny but capable 3.8B model. Runs on almost any hardware including phones.
3.8B≥ 2.73 GB
6
DeepSeek
DeepSeek R1 Distill 8B
Compact reasoning model. Good reasoning capabilities in a small package.
8B≥ 5.08 GB
7
Alibaba
Qwen 2.5 Coder 7B
Strong 7B code model rivaling larger coding models. Excellent for local development.
7.6B≥ 4.86 GB
8
Moondream
Moondream 2
Ultra-compact vision model. Only 1GB. Answers questions about images.
1.8B≥ 1.5 GB

Not sure which fits your machine? Auto-detect your hardware →