Curated collections

Hand-picked shortlists across 117+ models. Pick the list that matches your hardware or your job — skip the comparison spreadsheet.

8 GB tier8 picks

Best models for 8 GB VRAM

RTX 3060 / 3070 / M-series Mac with 16 GB unified memory

Curated picks that comfortably fit on an 8 GB GPU. Each ships in a Q4_K_M quant that leaves headroom for a 4–8K context window. Sorted by quality-per-byte.

12 GB tier8 picks

Best models for 12 GB VRAM

RTX 3060 12 GB / RTX 4070 / M-series Mac with 24 GB

12 GB unlocks bigger 12–14B models with a comfortable context window. Strong sweet spot for code, vision, and reasoning.

24 GB tier8 picks

Best models for 24 GB VRAM

RTX 3090 / 4090 / 5090 / M-series Mac with 36 GB+

24 GB is the LLM enthusiast sweet spot — 30 B class models at Q4 with 32K+ context, plus full FLUX.1 image generation.

4 GB tier8 picks

Best models for iPhone & iPad

Recent A17/A18/M-series with 8 GB+ unified memory

Sub-3 B parameter models that ship with our iOS app and run entirely on-device. No internet, no API keys — just inference.

Use case8 picks

Best coding models

From 1 B autocomplete to 14 B agentic refactoring

Code-specialised models ranked by HumanEval-class performance. Pair with Cursor / Continue / Aider for an offline copilot.

Use case6 picks

Best reasoning models

Chain-of-thought / o1-style local thinkers

Models trained to show their work. Ideal for math, code, and multi-step logic puzzles. All run with `<think>` traces enabled.

Use case6 picks

Best vision models

Local GPT-4V replacements

Image-in / text-out — describe screenshots, parse documents, count objects, read receipts. All work fully offline.

Use case7 picks

Best image-generation models

From SD 1.5 to FLUX.1

Local Stable Diffusion / Flux family. Each line in our blur-to-sharp Compare lab uses one of these.

Use case7 picks

Best voice models (STT + TTS)

Whisper + Piper + Kokoro

Speech-in / speech-out building blocks for offline voice assistants. Pair Whisper for STT with Piper or Kokoro for TTS.