Best models for 8 GB VRAM

RTX 3060 / 3070 / M-series Mac with 16 GB unified memory

Curated picks that comfortably fit on an 8 GB GPU. Each ships in a Q4_K_M quant that leaves headroom for a 4–8K context window. Sorted by quality-per-byte.

  1. 1

    Alibaba

    Qwen 2.5 7B Instruct

    Efficient 7B model with strong coding and reasoning abilities.

    7.6B5.3 GB
  2. 2

    Meta

    Llama 3.1 8B Instruct

    Meta's 8B parameter instruction-tuned model. Great balance of performance and efficiency for local deployment.

    8B5.08 GB
  3. 3

    Google

    Gemma 2 9B Instruct

    Google's efficient 9B model. Great performance-to-size ratio.

    9.2B5.87 GB
  4. 4

    Mistral AI

    Mistral 7B Instruct v0.3

    Efficient 7B model from Mistral AI with strong performance for its size.

    7.3B4.57 GB
  5. 5

    Microsoft

    Phi-3.5 Mini 3.8B

    Tiny but capable 3.8B model. Runs on almost any hardware including phones.

    3.8B2.73 GB
  6. 6

    DeepSeek

    DeepSeek R1 Distill 8B

    Compact reasoning model. Good reasoning capabilities in a small package.

    8B5.08 GB
  7. 7

    Alibaba

    Qwen 2.5 Coder 7B

    Strong 7B code model rivaling larger coding models. Excellent for local development.

    7.6B4.86 GB
  8. 8

    Moondream

    Moondream 2

    Ultra-compact vision model. Only 1GB. Answers questions about images.

    1.8B1.5 GB

Not sure which fits your machine? Auto-detect your hardware →