Curated collections
Hand-picked shortlists across 117+ models. Pick the list that matches your hardware or your job — skip the comparison spreadsheet.
Best models for 8 GB VRAM
RTX 3060 / 3070 / M-series Mac with 16 GB unified memory
Curated picks that comfortably fit on an 8 GB GPU. Each ships in a Q4_K_M quant that leaves headroom for a 4–8K context window. Sorted by quality-per-byte.
Best models for 12 GB VRAM
RTX 3060 12 GB / RTX 4070 / M-series Mac with 24 GB
12 GB unlocks bigger 12–14B models with a comfortable context window. Strong sweet spot for code, vision, and reasoning.
Best models for 24 GB VRAM
RTX 3090 / 4090 / 5090 / M-series Mac with 36 GB+
24 GB is the LLM enthusiast sweet spot — 30 B class models at Q4 with 32K+ context, plus full FLUX.1 image generation.
Best models for iPhone & iPad
Recent A17/A18/M-series with 8 GB+ unified memory
Sub-3 B parameter models that ship with our iOS app and run entirely on-device. No internet, no API keys — just inference.
Best coding models
From 1 B autocomplete to 14 B agentic refactoring
Code-specialised models ranked by HumanEval-class performance. Pair with Cursor / Continue / Aider for an offline copilot.
Best reasoning models
Chain-of-thought / o1-style local thinkers
Models trained to show their work. Ideal for math, code, and multi-step logic puzzles. All run with `<think>` traces enabled.
Best vision models
Local GPT-4V replacements
Image-in / text-out — describe screenshots, parse documents, count objects, read receipts. All work fully offline.
Best image-generation models
From SD 1.5 to FLUX.1
Local Stable Diffusion / Flux family. Each line in our blur-to-sharp Compare lab uses one of these.
Best voice models (STT + TTS)
Whisper + Piper + Kokoro
Speech-in / speech-out building blocks for offline voice assistants. Pair Whisper for STT with Piper or Kokoro for TTS.