Small AI Models (Under 3B)

Small models under 3 billion parameters are designed for edge deployment, mobile devices, and hardware-constrained environments. Despite their compact size, modern small models deliver surprisingly capable performance for many tasks. They start up quickly, require minimal VRAM (often under 2GB), and can run on devices as modest as a Raspberry Pi or older smartphone. These models are ideal for real-time applications, embedded systems, and privacy-focused local deployment.

58models available
0.1GB min VRAM needed

Sentence Transformers

all-MiniLM-L6-v2

Tiny embedding model. Only 23MB. Perfect for on-device search.

Embed0.023B0.1GB VRAM
195.5M downloads1 quants

BAAI

BGE Small EN v1.5

Compact English embedding model. Good for basic semantic search.

Embed0.033B0.1GB VRAM
17.0M downloads1 quants

Nomic AI

Nomic Embed Text v1.5

High quality text embedding model. 137M params. Good for RAG and search.

Embed0.137B0.3-0.76GB VRAM
10.9M downloads2 quants

Alibaba

Qwen 2.5 1.5B

Compact 1.5B model with strong multilingual and coding abilities.

Chat1.5B1.54-2.26GB VRAM
10.0M downloads2 quants

BAAI

BGE Large EN v1.5

High quality English embedding model. Best accuracy for English search.

Embed0.335B0.83-1.12GB VRAM
7.9M downloads2 quants

OpenAI

Whisper Large v3 Turbo

Optimized large Whisper model. Near-best accuracy with faster inference.

Speech0.81B2.01GB VRAM
6.4M downloads1 quants

BAAI

BGE Reranker v2 M3

Multilingual reranker. 100+ languages. 1.1GB.

0.568B1.58GB VRAM
6.2M downloads1 quants

Alibaba

Qwen 2.5 0.5B

Ultra-small 0.5B model from Alibaba. Minimal resource requirements.

Chat0.5B0.96-1.13GB VRAM
5.4M downloads2 quants

OpenAI

Whisper Large v3

Largest Whisper model. Best accuracy across all languages and accents.

Speech1.55B3.38GB VRAM
4.8M downloads1 quants

Meta

Llama 3.2 1B Instruct

Ultra-compact 1B model. Runs on virtually any device including smartphones.

Chat1.24B1.25-2.81GB VRAM
4.2M downloads3 quants

TinyLlama

TinyLlama 1.1B

Lightweight 1.1B chat model based on Llama architecture. Great for phones.

Chat1.1B1.12-1.59GB VRAM
3.0M downloads2 quants

Moondream

Moondream 2

Ultra-compact vision model. Only 1GB. Answers questions about images.

Vision1.8B1.5GB VRAM
2.8M downloads1 quants

Alibaba

Qwen2-VL 2B

Compact vision-language model. Default multimodal model. Can understand images and answer questions about them.

Vision2.2B1.42-2.03GB VRAM
2.3M downloads2 quants

OpenAI

Whisper Small

Compact Whisper model. Good accuracy for everyday transcription tasks.

Speech0.24B0.95GB VRAM
1.9M downloads1 quants

Runway

Stable Diffusion 1.5 (CoreML)

Classic image generation model. Pre-converted to CoreML for iOS/Mac. Downloads as zip, auto-extracts.

Image Gen0.86B2.5GB VRAM
1.7M downloads1 quants

OpenAI

Whisper Base

Base whisper model. Good balance of speed and accuracy. 142MB.

Speech0.074B0.3GB VRAM
1.5M downloads1 quants

HuggingFace

Distil-Whisper Large v3

Distilled Whisper. 6x faster than large-v3 with 1% accuracy loss.

Speech0.76B1.92GB VRAM
1.2M downloads1 quants

HuggingFace

SmolLM2 135M

Tiny 135M model. Default LLM - guaranteed to run on any iPhone. Only 145MB download. Perfect for quick experiments.

Chat0.135B0.64-0.75GB VRAM
899.5K downloads2 quants

Google

Gemma 3 1B

Google's latest tiny 1B model. Excellent quality for its size.

Chat1B1.25-1.5GB VRAM
863.0K downloads2 quants

OpenAI

Whisper Tiny

Tiny multilingual speech recognition. Only 75MB. Supports 99 languages. Runs on any device.

Speech0.039B0.2GB VRAM
761.0K downloads1 quants

DeepSeek

DeepSeek R1 Distill 1.5B

Compact reasoning model distilled from DeepSeek R1. Strong chain-of-thought in a tiny package.

Chat1.5B1.54-2.26GB VRAM
731.7K downloads2 quants

Alibaba

Qwen 2.5 Coder 1.5B

Compact code model with solid code generation and understanding abilities.

Code1.5B1.54-2.26GB VRAM
545.1K downloads2 quants

OpenAI

Whisper Medium

Mid-size Whisper model. Strong multilingual speech recognition.

Speech0.77B1.93GB VRAM
529.2K downloads1 quants

HuggingFace

SmolLM2 360M

Compact 360M model. Good for basic tasks on very constrained devices.

Chat0.36B0.75-0.86GB VRAM
443.9K downloads2 quants

Alibaba

Qwen 2.5 Coder 0.5B

Smallest code model. Default code assistant - runs on any iPhone. Great for code completion and simple programming tasks.

Code0.5B1.13GB VRAM
386.7K downloads1 quants

Google

Gemma 2 2B

Google's compact 2.6B model. Efficient and capable for mobile use.

Chat2.6B2.09-3.09GB VRAM
355.4K downloads2 quants

OpenBMB

MiniCPM-V 2.6

Efficient multimodal model with strong image understanding. Optimized for edge devices.

Vision2B2.1-3GB VRAM
143.4K downloads2 quants

HuggingFace

SmolLM2 1.7B

Capable 1.7B model from HuggingFace. Good balance for mobile devices.

Chat1.7B1.48-2.2GB VRAM
136.6K downloads2 quants

Meta

MusicGen Small

Music generation from text prompts. Requires multiple ONNX files (~435MB total). Experimental iOS support.

Audio0.3B0.78GB VRAM
110.1K downloads1 quants

OpenAI

Whisper Tiny English (Quantized)

Smallest possible speech recognition model. Only 32MB. English only. Default speech model - guaranteed to run on any iPhone.

Speech0.039B0.1GB VRAM
97.4K downloads1 quants

Kokoro

Kokoro 82M TTS

High quality 82M parameter TTS model. Excellent speech synthesis with multiple voice options. 86MB download.

TTS0.082B0.58GB VRAM
85.5K downloads1 quants

OpenAI

Whisper Base English

English-only base model. Faster and more accurate for English.

Speech0.074B0.3GB VRAM
80.3K downloads1 quants

DeepSeek

DeepSeek Coder 1.3B

Compact code model with strong coding capabilities. Great for mobile coding assistants.

Code1.3B1.31-1.83GB VRAM
72.4K downloads2 quants

LG AI

EXAONE 3.5 2.4B

Compact model from LG. Optimized for Korean and English.

Chat2.4B2.03-3.14GB VRAM
67.7K downloads2 quants

Snowflake

Snowflake Arctic Embed S

Compact embedding model from Snowflake. Good multilingual support.

Embed0.033B0.1GB VRAM
43.2K downloads1 quants

H2O.ai

Danube 3 500M

Ultra-tiny 500M model. Even smaller than SmolLM. Runs anywhere.

Chat0.5B0.8-1.01GB VRAM
33.9K downloads2 quants

IBM

Granite 3.3 2B

IBM's compact 2B model. Good at following instructions.

Chat2B1.94-3.01GB VRAM
29.3K downloads2 quants

Google

CodeGemma 2B

Lightweight code completion model from Google. Fast on-device code suggestions.

Code2B2.02-2.99GB VRAM
12.2K downloads2 quants

TII

Falcon 3 1B

Ultra-compact 1B model from Technology Innovation Institute.

Chat1B1.48-2.16GB VRAM
11.3K downloads2 quants

Stability AI

Stable Diffusion 3 Medium (GGUF)

SD 3 with MMDiT architecture. Superior text rendering.

Image Gen2.5B9.15GB VRAM
4.0K downloads1 quants

Jina AI

Jina Reranker Tiny EN

Tiny English reranker. Only 67MB. Use with embedding models for better search.

0.033B0.15GB VRAM
3.7K downloads1 quants

Runway / GPUStack

Stable Diffusion 1.5 (GGUF)

SD 1.5 in single-file GGUF format. Alternative to CoreML. Uses stable-diffusion.cpp with Metal acceleration.

Image Gen0.86B2.13-2.25GB VRAM
1.2K downloads2 quants

01.AI

Yi Coder 1.5B

Tiny code model. Great for phones. Fast completions.

Code1.5B1.4-1.96GB VRAM
175 downloads2 quants

Stability AI / Apple

Stable Diffusion 2.1 Base (CoreML)

Smallest CoreML image generation model. Palettized for minimal size (1.14GB). Runs on any iPhone with 6GB RAM. Default image generation model.

Image Gen0.86B1.56GB VRAM
18 downloads1 quants

Rhasspy

Piper TTS - Amy (English)

Lightweight TTS voice. High quality English speech synthesis. Default TTS model - runs on any iPhone. Only 63MB.

TTS0.02B0.15GB VRAM
1 quants

Rhasspy

Piper TTS - Lessac (English)

High quality English male voice. 63MB download. Runs on any device.

TTS0.02B0.15GB VRAM
1 quants

Rhasspy

Piper TTS - LibriTTS-R (English)

Medium quality English voice with natural prosody. 63MB download.

TTS0.02B0.57GB VRAM
1 quants

Stability AI

Stable Diffusion 2.1 (GGUF)

SD 2.1 in GGUF format. Better quality than 1.5.

Image Gen0.86B2.66GB VRAM
1 quants

Rhasspy

Piper TTS - Spanish (MLS)

Spanish female voice. Natural prosody.

TTS0.02B0.15GB VRAM
1 quants

Rhasspy

Piper TTS - French (Siwis)

French female voice.

TTS0.02B0.53GB VRAM
1 quants

Rhasspy

Piper TTS - German (Thorsten)

German male voice.

TTS0.02B0.15GB VRAM
1 quants

Rhasspy

Piper TTS - Chinese (Huayan)

Chinese Mandarin voice.

TTS0.02B0.15GB VRAM
1 quants

Rhasspy

Piper TTS - Japanese (Kokoro)

Japanese voice.

TTS0.02B0.15GB VRAM
1 quants

Rhasspy

Piper TTS - Korean

Korean voice.

TTS0.02B0.15GB VRAM
1 quants

Rhasspy

Piper TTS - Russian (Irina)

Russian female voice.

TTS0.02B0.15GB VRAM
1 quants

Rhasspy

Piper TTS - Portuguese (Faber)

Portuguese voice.

TTS0.02B0.15GB VRAM
1 quants

Rhasspy

Piper TTS - Italian (Riccardo)

Italian male voice.

TTS0.02B0.53GB VRAM
1 quants

Rhasspy

Piper TTS - Arabic (Kareem)

Arabic voice.

TTS0.02B0.15GB VRAM
1 quants

Browse Other Capabilities