Curated collections
Hand-picked shortlists across 145+ models. Pick the list that matches your hardware or your job — skip the comparison spreadsheet.
Best models for 8 GB VRAM
RTX 3060 / 3070 / M-series Mac with 16 GB unified memory
Curated picks that comfortably fit on an 8 GB GPU. Each ships in a Q4_K_M quant that leaves headroom for a 4–8K context window. Sorted by quality-per-byte.
Best models for 12 GB VRAM
RTX 3060 12 GB / RTX 4070 / M-series Mac with 24 GB
12 GB unlocks bigger 12–14B models with a comfortable context window. Strong sweet spot for code, vision, and reasoning.
Best models for 24 GB VRAM
RTX 3090 / 4090 / 5090 / M-series Mac with 36 GB+
24 GB is the LLM enthusiast sweet spot — 30 B class models at Q4 with 32K+ context, plus full FLUX.1 image generation.
Best models for iPhone & iPad
Recent A17/A18/M-series with 8 GB+ unified memory
Sub-3 B parameter models that ship with our iOS app and run entirely on-device. No internet, no API keys — just inference.
Best coding models
From 1 B autocomplete to 14 B agentic refactoring
Code-specialised models ranked by HumanEval-class performance. Pair with Cursor / Continue / Aider for an offline copilot.
Best reasoning models
Chain-of-thought / o1-style local thinkers
Models trained to show their work. Ideal for math, code, and multi-step logic puzzles. All run with `<think>` traces enabled.
Best vision models
Local GPT-4V replacements
Image-in / text-out — describe screenshots, parse documents, count objects, read receipts. All work fully offline.
Best image-generation models
From SD 1.5 to FLUX.1
Local Stable Diffusion / Flux family. Each line in our blur-to-sharp Compare lab uses one of these.
Best voice models (STT + TTS)
Whisper + Piper + Kokoro
Speech-in / speech-out building blocks for offline voice assistants. Pair Whisper for STT with Piper or Kokoro for TTS.
Uncensored chat & assistant models
Refusal-removed general LLMs — abliterated, Dolphin, and natural base models
Local LLMs without the 'I can't help with that' reflex. Includes mlabonne abliterations (refusal-direction ablation, no retraining), Cognitive Computations Dolphin fine-tunes, and official base models that were never RLHF-aligned. Read each model's license — some inherit Llama Community terms.
Uncensored creative writing & roleplay
TheDrummer, Sao10K, Anthracite — long-form prose and character chat
Models fine-tuned for narrative writing and character roleplay without alignment filters. Cydonia and Rocinante (TheDrummer), Euryale and Stheno (Sao10K), and Magnum (Anthracite) are the active reference families. Some carry non-commercial licenses — check before commercial use.
Uncensored coding models
Code generation without filters
Coding-specialized models with refusal direction ablated. Useful for security research, dual-use tooling, and code that mainstream-aligned assistants decline to write. Codestral derivatives inherit Mistral's non-commercial research license.
Naturally uncensored base models
Official foundation models with no instruct or RLHF alignment
Pretrained-only models from Mistral, Qwen, and Meta — no abliteration needed because alignment was never applied. Closer to the raw distribution; less assistant-shaped, more open-ended. Best for researchers and for fine-tuning your own assistant.