rankings · best AI models by category & VRAM

./rankings·top-of-list · 137 models across 8 categoriessorted by min VRAM asc · params desc as tie-break

top of list

best model in each category, ranked

rank #1 = smallest VRAM that still ships quality. ranking ties broken by parameter count. click a row to drill into the model card.

chat & general 74 coding 17 image gen 9 speech-to-text 9 text-to-speech 14 audio gen 3 multimodal / vision 6 embedding 5

chat & general·74 modelsllm

general-purpose language models for conversation, writing, and reasoning

rank	model	author	params	min vram	action
▲1	SmolLM2 135M Tiny 135M model. Default LLM - guaranteed to run on any iPhone. Only 145MB download. Per…	HuggingFace	0.135B	0.6GB	open →
△2	SmolLM2 360M Compact 360M model. Good for basic tasks on very constrained devices.	HuggingFace	0.36B	0.8GB	open →
▴3	Danube 3 500M Ultra-tiny 500M model. Even smaller than SmolLM. Runs anywhere.	H2O.ai	0.5B	0.8GB	open →
4	Qwen 2.5 0.5B Ultra-small 0.5B model from Alibaba. Minimal resource requirements.	Alibaba	0.5B	1.0GB	open →
5	TinyLlama 1.1B Lightweight 1.1B chat model based on Llama architecture. Great for phones.	TinyLlama	1.1B	1.1GB	open →
6	Llama 3.2 1B Instruct Ultra-compact 1B model. Runs on virtually any device including smartphones.	Meta	1.24B	1.3GB	open →
7	Gemma 3 1B Google's latest tiny 1B model. Excellent quality for its size.	Google	1B	1.3GB	open →
8	Granite 3.0 1B-A400M Tiny IBM MoE for edge and CPU inference. 1.3 B total, only 400 M active.	IBM	1.3B	1.3GB	open →
9	SmolLM2 1.7B Capable 1.7B model from HuggingFace. Good balance for mobile devices.	HuggingFace	1.7B	1.5GB	open →
10	Falcon 3 1B Ultra-compact 1B model from Technology Innovation Institute.	TII	1B	1.5GB	open →
11	Qwen 2.5 1.5B Compact 1.5B model with strong multilingual and coding abilities.	Alibaba	1.5B	1.5GB	open →
12	DeepSeek R1 Distill 1.5B Compact reasoning model distilled from DeepSeek R1. Strong chain-of-thought in a tiny pa…	DeepSeek	1.5B	1.5GB	open →
13	Granite 3.3 2B IBM's compact 2B model. Good at following instructions.	IBM	2B	1.9GB	open →
14	EXAONE 3.5 2.4B Compact model from LG. Optimized for Korean and English.	LG AI	2.4B	2.0GB	open →
15	StableLM Zephyr 3B Compact 3B model from Stability AI. Good chat quality for its size.	Stability AI	3B	2.1GB	open →
16	Rocket 3B Fast 3B model tuned for helpful responses.	Pansophic	3B	2.1GB	open →
17	Gemma 2 2B Google's compact 2.6B model. Efficient and capable for mobile use.	Google	2.6B	2.1GB	open →
18	Falcon 3 3B Compact 3B Falcon model with good performance.	TII	3B	2.4GB	open →
19	Llama 3.2 3B Instruct Meta's compact 3B model designed for edge and mobile deployment.	Meta	3.2B	2.4GB	open →
20	Granite 3.0 3B-A800M IBM enterprise-grade small MoE. 3.4 B total, 800 M active. Long context, function-callin…	IBM	3.4B	2.4GB	open →
21	Qwen 2.5 3B Versatile 3B model with strong reasoning and multilingual capabilities.	Alibaba	3B	2.5GB	open →
22	Danube 3 4B Capable 4B model from H2O.ai. Good for phones.	H2O.ai	4B	2.7GB	open →
23	Phi-3.5 Mini 3.8B Tiny but capable 3.8B model. Runs on almost any hardware including phones.	Microsoft	3.8B	2.7GB	open →
24	Gemma 3 4B Balanced 4B model with strong reasoning. Great for iPhones.	Google	4B	2.8GB	open →
25	Phi-4 Mini 3.8B Latest Phi mini with strong reasoning. Drop-in upgrade from Phi-3.5 Mini.	Microsoft	3.8B	2.8GB	open →
26	Nemotron Mini 4B NVIDIA's compact 4B model optimized for edge deployment.	NVIDIA	4B	3.0GB	open →
27	Yi 1.5 6B Chat Efficient 6B bilingual (English/Chinese) model.	01.AI	6B	3.9GB	open →
28	OLMoE 1B-7B Fully open MoE — 7 B total, only 1.3 B active per token. Tiny footprint, surprisingly ca…	AI2	6.9B	4.4GB	open →
29	Mistral 7B Instruct v0.3 Efficient 7B model from Mistral AI with strong performance for its size.	Mistral AI	7.3B	4.6GB	open →
30	OpenChat 3.5 7B Fine-tuned Mistral 7B for chat. Strong instruction following.	OpenChat	7B	4.6GB	open →
31	OLMo 2 7B Fully open research model. Transparent training.	Allen AI	7B	4.7GB	open →
32	InternLM 2.5 7B Strong 7B model from China. Good at tool use and math.	Shanghai AI Lab	7.7B	4.9GB	open →
33	EXAONE 3.5 7.8B 7.8B model from LG. Strong bilingual Korean/English.	LG AI	7.8B	4.9GB	open →
34	Falcon 3 7B Full-size Falcon 3 with strong performance across benchmarks.	TII	7B	5.0GB	open →
35	DeepSeek R1 Distill 8B Compact reasoning model. Good reasoning capabilities in a small package.	DeepSeek	8B	5.1GB	open →
36	Llama 3.1 8B Instruct Meta's 8B parameter instruction-tuned model. Great balance of performance and efficiency…	Meta	8B	5.1GB	open →
37	Dolphin 3.0 Llama 3.1 8B Eric Hartford's flagship uncensored fine-tune of Llama 3.1 8B. Steerable assistant with …	Cognitive Computations	8B	5.1GB	open →
38	NeuralDaredevil 8B (abliterated) Llama-3 8B with refusal direction ablated, then DPO-recovered to restore capability. Bes…	mlabonne	8B	5.1GB	open →
39	Llama 3.1 8B Instruct (abliterated) Pure refusal-direction ablation of Llama-3.1-8B-Instruct. No retraining — keeps the offi…	mlabonne	8B	5.1GB	open →
40	Stheno L3 8B v3.2 Long-running 8B roleplay reference. Trained for character voice consistency and long-for…	Sao10K	8B	5.1GB	open →
41	Granite 3.3 8B IBM's 8B instruction model. Enterprise quality.	IBM	8B	5.1GB	open →
42	Qwen3 8B Base Official Qwen3 8B foundation model — pretrained only, no RLHF or refusal training. The '…	Alibaba	8B	5.3GB	open →
43	Qwen 2.5 7B Instruct Efficient 7B model with strong coding and reasoning abilities.	Alibaba	7.6B	5.3GB	open →
44	Yi 1.5 9B Chat 9B bilingual model with strong reasoning.	01.AI	9B	5.5GB	open →
45	Gemma 2 9B Instruct Google's efficient 9B model. Great performance-to-size ratio.	Google	9.2B	5.9GB	open →
46	Falcon 3 10B 10B Falcon model. Good iPad model.	TII	10B	6.4GB	open →
47	Solar 10.7B Depth-upscaled 10.7B model. Strong reasoning.	Upstage	10.7B	6.5GB	open →
48	Gemma 3 MoE 9B Gemma 3 MoE variant. 9 B total, 2.5 B active. Strong fit for 12 GB cards.	Google	9B	7.0GB	open →
49	Gemma 3 12B High quality 12B model. Excellent for iPad Pro and Mac.	Google	12B	7.3GB	open →
50	Mistral Nemo 12B Mistral's 12B model with excellent instruction following.	Mistral AI	12B	7.5GB	open →
51	Magnum v4 12B Mistral-Nemo-12B fine-tuned on curated Claude-style prose data. Built for long-form crea…	Anthracite	12B	7.5GB	open →
52	Rocinante 12B v1.1 Mistral-Nemo-12B roleplay fine-tune optimized for character chat. Stable workhorse for t…	TheDrummer	12B	7.5GB	open →
53	Mistral Nemo Base 12B Official Mistral-Nemo 12B foundation model (NVIDIA collab) — pretrained only, no instruc…	Mistral AI	12B	7.7GB	open →
54	Qwen 2.5 14B Strong 14B model with excellent coding and reasoning. iPad Pro recommended.	Alibaba	14B	8.9GB	open →
55	Phi-4 Microsoft's 14B parameter model. Punches well above its weight on reasoning.	Microsoft	14B	8.9GB	open →
56	Rocinante XL 16B v1 Newest Rocinante release — 16B upscaled Mistral-Nemo for richer prose at the 12-16GB tie…	TheDrummer	16B	9.6GB	open →
57	DeepSeek MoE 16B DeepSeek first MoE — 16.4 B total, 2.8 B active. The original consumer-runnable open MoE…	DeepSeek	16.4B	11.0GB	open →
58	Mistral Small 22B 22B parameter model. Strong reasoning and multilingual. Needs 16GB+ RAM.	Mistral AI	22B	12.9GB	open →
59	Magnum v4 22B Mistral-Small-22B base, Anthracite's Claude-style prose training. Sits between 12B and 7…	Anthracite	22B	12.9GB	open →
60	Dolphin 3.0 R1 Mistral 24B Only widely-available uncensored R1-style reasoning model. Mistral-Small-24B base with c…	Cognitive Computations	24B	13.8GB	open →
61	Cydonia 24B v4.3 Top-of-line 24B roleplay model, Mistral-Small-3.2-24B base. Active development cycle — T…	TheDrummer	24B	13.8GB	open →
62	Dolphin Mistral 24B (Venice Edition) Headline 24B uncensored pick — top community engagement among uncensored models on HF. S…	Cognitive Computations	24B	14.9GB	open →
63	Gemma 3 27B Google's flagship open model. Near GPT-4 quality. Needs 20GB+ RAM.	Google	27B	15.9GB	open →
64	Skyfall 31B v4.2 31B creative-writing model — sweet spot between 24B and 70B. Built on Mistral-Small-3.1 …	TheDrummer	31B	18.2GB	open →
65	Qwen 2.5 32B Premium 32B model. Top-tier reasoning. Mac with 32GB+ RAM.	Alibaba	32B	19.0GB	open →
66	Qwen3 30B-A3B Mixture-of-Experts model with 30 B total parameters but only 3 B active per token. Runs …	Alibaba	30.5B	20.0GB	open →
67	Phi-3.5 MoE Microsoft MoE — 16 experts of 3.8 B, 6.6 B active per token. Strong reasoning at modest …	Microsoft	41.9B	24.1GB	open →
68	Mixtral 8x7B Instruct The OG public MoE — 8 experts, 2 active per token, 47 B total / 13 B active. Apache-2.0.	Mistral AI	46.7B	25.1GB	open →
69	Llama 3.1 70B Instruct Meta's flagship 70B parameter model. Excellent performance rivaling GPT-4 on many benchm…	Meta	70B	40.1GB	open →
70	Euryale L3.3 70B v2.3 Canonical 70B creative-writing and roleplay model. Llama-3.3-70B base with extended trai…	Sao10K	70B	40.1GB	open →
71	Llama 3.1 70B (lorablated) Llama-3.1-70B-Instruct with abliteration applied via LoRA merge. Cleanest 70B refusal-re…	mlabonne	70B	40.1GB	open →
72	Magnum v4 72B Qwen2.5-72B fine-tuned on Claude-Opus-style literary data. Highest-quality long-form pro…	Anthracite	72B	44.7GB	open →
73	Mixtral 8x22B Instruct 141 B total / 39 B active MoE. Larger Mixtral; needs serious hardware.	Mistral AI	141B	88.0GB	open →
74	Qwen3 235B-A22B Flagship MoE — 235 B total parameters, 22 B active. Frontier quality but needs 80 GB+ VR…	Alibaba	235B	144.0GB	open →

coding·17 modelscode

specialized models for code generation, completion, and debugging

rank	model	author	params	min vram	action
▲1	Qwen 2.5 Coder 0.5B Smallest code model. Default code assistant - runs on any iPhone. Great for code complet…	Alibaba	0.5B	1.1GB	open →
△2	DeepSeek Coder 1.3B Compact code model with strong coding capabilities. Great for mobile coding assistants.	DeepSeek	1.3B	1.3GB	open →
▴3	Yi Coder 1.5B Tiny code model. Great for phones. Fast completions.	01.AI	1.5B	1.4GB	open →
4	Qwen 2.5 Coder 1.5B Compact code model with solid code generation and understanding abilities.	Alibaba	1.5B	1.5GB	open →
5	CodeGemma 2B Lightweight code completion model from Google. Fast on-device code suggestions.	Google	2B	2.0GB	open →
6	Stable Code 3B Compact code model with good completion quality.	Stability AI	3B	2.1GB	open →
7	StarCoder2 3B Code completion model trained on The Stack v2. 600+ languages.	BigCode	3B	2.3GB	open →
8	Qwen 2.5 Coder 3B Capable 3B code model. Good balance of coding ability and resource usage.	Alibaba	3B	2.5GB	open →
9	Code Llama 7B Meta's code-specialized Llama model. Good at code completion.	Meta	7B	4.3GB	open →
10	DeepSeek Coder 6.7B Powerful 6.7B code model with excellent code generation across many languages.	DeepSeek	6.7B	4.3GB	open →
11	StarCoder2 7B Larger code model with better completions.	BigCode	7B	4.7GB	open →
12	Qwen 2.5 Coder 7B Strong 7B code model rivaling larger coding models. Excellent for local development.	Alibaba	7.6B	4.9GB	open →
13	Yi Coder 9B Strong 9B code model with good reasoning.	01.AI	9B	5.5GB	open →
14	CodeGemma 7B Google's instruction-tuned code model. Strong code generation and understanding.	Google	8.5B	5.5GB	open →
15	Code Llama 13B Instruct 13B code model for complex tasks. iPad Pro recommended.	Meta	13B	7.8GB	open →
16	Qwen 2.5 Coder 14B Powerful 14B code model. Excellent for complex programming tasks.	Alibaba	14B	8.9GB	open →
17	Codestral 22B (abliterated) Mistral Codestral with refusal direction ablated. Code-specialized model without the 'I …	failspy	22B	12.9GB	open →

image gen·9 modelsimage

text-to-image models for art, photos, and design

rank	model	author	params	min vram	action
▲1	Stable Diffusion 2.1 Base (CoreML) Smallest CoreML image generation model. Palettized for minimal size (1.14GB). Runs on an…	Stability AI / Apple	0.86B	1.6GB	open →
△2	Stable Diffusion 1.5 (GGUF) SD 1.5 in single-file GGUF format. Alternative to CoreML. Uses stable-diffusion.cpp with…	Runway / GPUStack	0.86B	2.1GB	open →
▴3	Stable Diffusion 1.5 (CoreML) Classic image generation model. Pre-converted to CoreML for iOS/Mac. Downloads as zip, a…	Runway	0.86B	2.5GB	open →
4	Stable Diffusion 2.1 (GGUF) SD 2.1 in GGUF format. Better quality than 1.5.	Stability AI	0.86B	2.7GB	open →
5	Stable Diffusion XL (CoreML) Higher quality image generation. CoreML optimized for iOS. Requires 6GB+ usable memory (…	Stability AI	3.5B	3.3GB	open →
6	SDXL Turbo (GGUF) Single-step SDXL. Near-instant image generation.	Stability AI	3.5B	5.0GB	open →
7	Stable Diffusion 3 Medium (GGUF) SD 3 with MMDiT architecture. Superior text rendering.	Stability AI	2.5B	9.2GB	open →
8	FLUX.1 Schnell (GGUF) Fast 1-4 step generation. State-of-the-art quality. Needs 16GB+ RAM.	Black Forest Labs	12B	14.0GB	open →
9	FLUX.1 Dev (GGUF) Highest quality FLUX model. 20-50 steps. Mac with 24GB+ RAM.	Black Forest Labs	12B	14.0GB	open →

speech-to-text·9 modelsstt

transcription and speech recognition models

rank	model	author	params	min vram	action
▲1	Whisper Tiny English (Quantized) Smallest possible speech recognition model. Only 32MB. English only. Default speech mode…	OpenAI	0.039B	0.1GB	open →
△2	Whisper Tiny Tiny multilingual speech recognition. Only 75MB. Supports 99 languages. Runs on any devi…	OpenAI	0.039B	0.2GB	open →
▴3	Whisper Base Base whisper model. Good balance of speed and accuracy. 142MB.	OpenAI	0.074B	0.3GB	open →
4	Whisper Base English English-only base model. Faster and more accurate for English.	OpenAI	0.074B	0.3GB	open →
5	Whisper Small Compact Whisper model. Good accuracy for everyday transcription tasks.	OpenAI	0.24B	0.9GB	open →
6	Distil-Whisper Large v3 Distilled Whisper. 6x faster than large-v3 with 1% accuracy loss.	HuggingFace	0.76B	1.9GB	open →
7	Whisper Medium Mid-size Whisper model. Strong multilingual speech recognition.	OpenAI	0.77B	1.9GB	open →
8	Whisper Large v3 Turbo Optimized large Whisper model. Near-best accuracy with faster inference.	OpenAI	0.81B	2.0GB	open →
9	Whisper Large v3 Largest Whisper model. Best accuracy across all languages and accents.	OpenAI	1.55B	3.4GB	open →

text-to-speech·14 modelstts

voice synthesis and text-to-speech models

rank	model	author	params	min vram	action
▲1	Piper TTS - Amy (English) Lightweight TTS voice. High quality English speech synthesis. Default TTS model - runs o…	Rhasspy	0.02B	0.1GB	open →
△2	Piper TTS - Lessac (English) High quality English male voice. 63MB download. Runs on any device.	Rhasspy	0.02B	0.1GB	open →
▴3	Piper TTS - Spanish (MLS) Spanish female voice. Natural prosody.	Rhasspy	0.02B	0.1GB	open →
4	Piper TTS - German (Thorsten) German male voice.	Rhasspy	0.02B	0.1GB	open →
5	Piper TTS - Chinese (Huayan) Chinese Mandarin voice.	Rhasspy	0.02B	0.1GB	open →
6	Piper TTS - Japanese (Kokoro) Japanese voice.	Rhasspy	0.02B	0.1GB	open →
7	Piper TTS - Korean Korean voice.	Rhasspy	0.02B	0.1GB	open →
8	Piper TTS - Russian (Irina) Russian female voice.	Rhasspy	0.02B	0.1GB	open →
9	Piper TTS - Portuguese (Faber) Portuguese voice.	Rhasspy	0.02B	0.1GB	open →
10	Piper TTS - Arabic (Kareem) Arabic voice.	Rhasspy	0.02B	0.1GB	open →
11	Piper TTS - French (Siwis) French female voice.	Rhasspy	0.02B	0.5GB	open →
12	Piper TTS - Italian (Riccardo) Italian male voice.	Rhasspy	0.02B	0.5GB	open →
13	Piper TTS - LibriTTS-R (English) Medium quality English voice with natural prosody. 63MB download.	Rhasspy	0.02B	0.6GB	open →
14	Kokoro 82M TTS High quality 82M parameter TTS model. Excellent speech synthesis with multiple voice opt…	Kokoro	0.082B	0.6GB	open →

audio gen·3 modelsaudio

AI music and audio creation

rank	model	author	params	min vram	action
▲1	MusicGen Small Music generation from text prompts. Requires multiple ONNX files (~435MB total). Experim…	Meta	0.3B	0.8GB	open →
△2	Stable Audio Open 47-second variable-length audio generation. Sound effects and short loops.	Stability AI	1B	6.0GB	open →
▴3	ACE-Step 1.5XL Music generation rivaling Suno. Generates structured songs with vocals from a text promp…	ACE Studio	1.5B	8.0GB	open →

multimodal / vision·6 modelsvlm

models that understand both images and text

rank	model	author	params	min vram	action
▲1	Qwen2-VL 2B Compact vision-language model. Default multimodal model. Can understand images and answe…	Alibaba	2.2B	1.4GB	open →
△2	Moondream 2 Ultra-compact vision model. Only 1GB. Answers questions about images.	Moondream	1.8B	1.5GB	open →
▴3	MiniCPM-V 2.6 Efficient multimodal model with strong image understanding. Optimized for edge devices.	OpenBMB	2B	2.1GB	open →
4	PaliGemma 3B Google's vision model. Strong at visual QA, captioning, and OCR.	Google	3B	2.5GB	open →
5	Phi-3.5 Vision Vision-language model from Microsoft. Can understand images and documents.	Microsoft	4.2B	3.2GB	open →
6	LLaVA 1.6 7B Multimodal vision-language model. Understands images and answers questions about them.	LLaVA	7B	5.0GB	open →

embedding·5 modelsembed

text embedding models for search and retrieval

rank	model	author	params	min vram	action
▲1	BGE Small EN v1.5 Compact English embedding model. Good for basic semantic search.	BAAI	0.033B	0.1GB	open →
△2	Snowflake Arctic Embed S Compact embedding model from Snowflake. Good multilingual support.	Snowflake	0.033B	0.1GB	open →
▴3	all-MiniLM-L6-v2 Tiny embedding model. Only 23MB. Perfect for on-device search.	Sentence Transformers	0.023B	0.1GB	open →
4	Nomic Embed Text v1.5 High quality text embedding model. 137M params. Good for RAG and search.	Nomic AI	0.137B	0.3GB	open →
5	BGE Large EN v1.5 High quality English embedding model. Best accuracy for English search.	BAAI	0.335B	0.8GB	open →

cloud://gpu·escape hatch

can't run the model you want?

cloud GPUs give you instant access to any model, any size.

runpod · from $0.25/hr vast.ai · from $0.15/hr best gpu buyer guide ↗