Small AI Models (Under 3B)

Small models under 3 billion parameters are designed for edge deployment, mobile devices, and hardware-constrained environments. Despite their compact size, modern small models deliver surprisingly capable performance for many tasks. They start up quickly, require minimal VRAM (often under 2GB), and can run on devices as modest as a Raspberry Pi or older smartphone. These models are ideal for real-time applications, embedded systems, and privacy-focused local deployment.

63models available

0.1GB min VRAM needed

Sentence Transformers

all-MiniLM-L6-v2

Tiny embedding model. Only 23MB. Perfect for on-device search.

Embed0.023B0.1GB VRAM

262.8M downloads1 quants

BAAI

BGE Small EN v1.5

Compact English embedding model. Good for basic semantic search.

Embed0.033B0.1GB VRAM

51.4M downloads1 quants

Nomic AI

Nomic Embed Text v1.5

High quality text embedding model. 137M params. Good for RAG and search.

Embed0.137B0.3-0.76GB VRAM

17.3M downloads2 quants

BAAI

BGE Large EN v1.5

High quality English embedding model. Best accuracy for English search.

Embed0.335B0.83-1.12GB VRAM

15.1M downloads2 quants

Alibaba

Qwen 2.5 1.5B

Compact 1.5B model with strong multilingual and coding abilities.

Chat1.5B1.54-2.26GB VRAM

14.8M downloads2 quants

BAAI

BGE Reranker v2 M3

Multilingual reranker. 100+ languages. 1.1GB.

0.568B1.58GB VRAM

13.1M downloads1 quants

Llama 3.2 1B Instruct

Ultra-compact 1B model. Runs on virtually any device including smartphones.

Chat1.24B1.25-2.81GB VRAM

8.3M downloads3 quants

OpenAI

Whisper Large v3 Turbo

Optimized large Whisper model. Near-best accuracy with faster inference.

Speech0.81B2.01GB VRAM

7.9M downloads1 quants

OpenAI

Whisper Large v3

Largest Whisper model. Best accuracy across all languages and accents.

Speech1.55B3.38GB VRAM

5.3M downloads1 quants

Alibaba

Qwen 2.5 0.5B

Ultra-small 0.5B model from Alibaba. Minimal resource requirements.

Chat0.5B0.96-1.13GB VRAM

4.7M downloads2 quants

Alibaba

Qwen2-VL 2B

Compact vision-language model. Default multimodal model. Can understand images and answer questions about them.

Vision2.2B1.42-2.03GB VRAM

3.7M downloads2 quants

Moondream

Moondream 2

Ultra-compact vision model. Only 1GB. Answers questions about images.

Vision1.8B1.5GB VRAM

2.8M downloads1 quants

OpenAI

Whisper Small

Compact Whisper model. Good accuracy for everyday transcription tasks.

Speech0.24B0.95GB VRAM

2.6M downloads1 quants

OpenAI

Whisper Base

Base whisper model. Good balance of speed and accuracy. 142MB.

Speech0.074B0.3GB VRAM

2.4M downloads1 quants

TinyLlama

TinyLlama 1.1B

Lightweight 1.1B chat model based on Llama architecture. Great for phones.

Chat1.1B1.12-1.59GB VRAM

2.3M downloads2 quants

Microsoft

TRELLIS Image Large

Image-to-3D model that produces textured meshes. Runs in ~12 GB VRAM and outputs glTF.

1.2B12GB VRAM

1.9M downloads1 quants

Runway

Stable Diffusion 1.5 (CoreML)

Classic image generation model. Pre-converted to CoreML for iOS/Mac. Downloads as zip, auto-extracts.

Image Gen0.86B2.5GB VRAM

1.8M downloads1 quants

HuggingFace

SmolLM2 135M

Tiny 135M model. Default LLM - guaranteed to run on any iPhone. Only 145MB download. Perfect for quick experiments.

Chat0.135B0.64-0.75GB VRAM

1.3M downloads2 quants

Google

Gemma 3 1B

Google's latest tiny 1B model. Excellent quality for its size.

Chat1B1.25-1.5GB VRAM

1.3M downloads2 quants

HuggingFace

Distil-Whisper Large v3

Distilled Whisper. 6x faster than large-v3 with 1% accuracy loss.

Speech0.76B1.92GB VRAM

986.2K downloads1 quants

OpenAI

Whisper Tiny

Tiny multilingual speech recognition. Only 75MB. Supports 99 languages. Runs on any device.

Speech0.039B0.2GB VRAM

832.4K downloads1 quants

DeepSeek

DeepSeek R1 Distill 1.5B

Compact reasoning model distilled from DeepSeek R1. Strong chain-of-thought in a tiny package.

Chat1.5B1.54-2.26GB VRAM

749.9K downloads2 quants

OpenAI

Whisper Medium

Mid-size Whisper model. Strong multilingual speech recognition.

Speech0.77B1.93GB VRAM

711.8K downloads1 quants

Alibaba

Qwen 2.5 Coder 1.5B

Compact code model with solid code generation and understanding abilities.

Code1.5B1.54-2.26GB VRAM

632.9K downloads2 quants

Kokoro

Kokoro 82M TTS

High quality 82M parameter TTS model. Excellent speech synthesis with multiple voice options. 86MB download.

TTS0.082B0.58GB VRAM

565.5K downloads1 quants

Google

Gemma 2 2B

Google's compact 2.6B model. Efficient and capable for mobile use.

Chat2.6B2.09-3.09GB VRAM

385.1K downloads2 quants

HuggingFace

SmolLM2 360M

Compact 360M model. Good for basic tasks on very constrained devices.

Chat0.36B0.75-0.86GB VRAM

286.8K downloads2 quants

HuggingFace

SmolLM2 1.7B

Capable 1.7B model from HuggingFace. Good balance for mobile devices.

Chat1.7B1.48-2.2GB VRAM

194.9K downloads2 quants

MusicGen Small

Music generation from text prompts. Requires multiple ONNX files (~435MB total). Experimental iOS support.

Audio0.3B0.78GB VRAM

133.8K downloads1 quants

OpenBMB

MiniCPM-V 2.6

Efficient multimodal model with strong image understanding. Optimized for edge devices.

Vision2B2.1-3GB VRAM

126.7K downloads2 quants

Alibaba

Qwen 2.5 Coder 0.5B

Smallest code model. Default code assistant - runs on any iPhone. Great for code completion and simple programming tasks.

Code0.5B1.13GB VRAM

106.8K downloads1 quants

OpenAI

Whisper Tiny English (Quantized)

Smallest possible speech recognition model. Only 32MB. English only. Default speech model - guaranteed to run on any iPhone.

Speech0.039B0.1GB VRAM

91.7K downloads1 quants

Tencent

Hunyuan3D 2

Two-stage image-to-3D — shape generation then PBR texture synthesis. Strong topology.

2.5B16GB VRAM

88.4K downloads1 quants

DeepSeek

DeepSeek Coder 1.3B

Compact code model with strong coding capabilities. Great for mobile coding assistants.

Code1.3B1.31-1.83GB VRAM

42.4K downloads2 quants

Jina AI

Jina Reranker Tiny EN

Tiny English reranker. Only 67MB. Use with embedding models for better search.

0.033B0.15GB VRAM

41.5K downloads1 quants

Snowflake

Snowflake Arctic Embed S

Compact embedding model from Snowflake. Good multilingual support.

Embed0.033B0.1GB VRAM

39.8K downloads1 quants

Stability AI

Stable Audio Open

47-second variable-length audio generation. Sound effects and short loops.

Audio1B6GB VRAM

33.4K downloads1 quants

Google

CodeGemma 2B

Lightweight code completion model from Google. Fast on-device code suggestions.

Code2B2.02-2.99GB VRAM

33.2K downloads2 quants

LG AI

EXAONE 3.5 2.4B

Compact model from LG. Optimized for Korean and English.

Chat2.4B2.03-3.14GB VRAM

32.0K downloads2 quants

H2O.ai

Danube 3 500M

Ultra-tiny 500M model. Even smaller than SmolLM. Runs anywhere.

Chat0.5B0.8-1.01GB VRAM

29.2K downloads2 quants

IBM

Granite 3.3 2B

IBM's compact 2B model. Good at following instructions.

Chat2B1.94-3.01GB VRAM

28.2K downloads2 quants

OpenAI

Whisper Base English

English-only base model. Faster and more accurate for English.

Speech0.074B0.3GB VRAM

23.9K downloads1 quants

TII

Falcon 3 1B

Ultra-compact 1B model from Technology Innovation Institute.

Chat1B1.48-2.16GB VRAM

15.7K downloads2 quants

Stability AI

Stable Diffusion 3 Medium (GGUF)

SD 3 with MMDiT architecture. Superior text rendering.

Image Gen2.5B9.15GB VRAM

5.5K downloads1 quants

Runway / GPUStack

Stable Diffusion 1.5 (GGUF)

SD 1.5 in single-file GGUF format. Alternative to CoreML. Uses stable-diffusion.cpp with Metal acceleration.

Image Gen0.86B2.13-2.25GB VRAM

1.4K downloads2 quants

IBM

Granite 3.0 1B-A400M

Tiny IBM MoE for edge and CPU inference. 1.3 B total, only 400 M active.

Chat1.3B1.27GB VRAM

1.2K downloads1 quants

01.AI

Yi Coder 1.5B

Tiny code model. Great for phones. Fast completions.

Code1.5B1.4-1.96GB VRAM

449 downloads2 quants

Stability AI / Apple

Stable Diffusion 2.1 Base (CoreML)

Smallest CoreML image generation model. Palettized for minimal size (1.14GB). Runs on any iPhone with 6GB RAM. Default image generation model.

Image Gen0.86B1.56GB VRAM

26 downloads1 quants

Rhasspy

Piper TTS - Amy (English)

Lightweight TTS voice. High quality English speech synthesis. Default TTS model - runs on any iPhone. Only 63MB.

TTS0.02B0.15GB VRAM

1 quants

Rhasspy

Piper TTS - Lessac (English)

High quality English male voice. 63MB download. Runs on any device.

TTS0.02B0.15GB VRAM

1 quants

Rhasspy

Piper TTS - LibriTTS-R (English)

Medium quality English voice with natural prosody. 63MB download.

TTS0.02B0.57GB VRAM

1 quants

Stability AI

Stable Diffusion 2.1 (GGUF)

SD 2.1 in GGUF format. Better quality than 1.5.

Image Gen0.86B2.66GB VRAM

1 quants

Rhasspy

Piper TTS - Spanish (MLS)

Spanish female voice. Natural prosody.

TTS0.02B0.15GB VRAM

1 quants

Rhasspy

Piper TTS - French (Siwis)

French female voice.

TTS0.02B0.53GB VRAM

1 quants

Rhasspy

Piper TTS - German (Thorsten)

German male voice.

TTS0.02B0.15GB VRAM

1 quants

Rhasspy

Piper TTS - Chinese (Huayan)

Chinese Mandarin voice.

TTS0.02B0.15GB VRAM

1 quants

Rhasspy

Piper TTS - Japanese (Kokoro)

Japanese voice.

TTS0.02B0.15GB VRAM

1 quants

Rhasspy

Piper TTS - Korean

Korean voice.

TTS0.02B0.15GB VRAM

1 quants

Rhasspy

Piper TTS - Russian (Irina)

Russian female voice.

TTS0.02B0.15GB VRAM

1 quants

Rhasspy

Piper TTS - Portuguese (Faber)

Portuguese voice.

TTS0.02B0.15GB VRAM

1 quants

Rhasspy

Piper TTS - Italian (Riccardo)

Italian male voice.

TTS0.02B0.53GB VRAM

1 quants

Rhasspy

Piper TTS - Arabic (Kareem)

Arabic voice.

TTS0.02B0.15GB VRAM

1 quants

ACE Studio

ACE-Step 1.5XL

Music generation rivaling Suno. Generates structured songs with vocals from a text prompt.

Audio1.5B8GB VRAM

1 quants

Browse Other Capabilities

Uncensored & Abliterated AI Models

18 models

Vision & Multimodal AI Models

6 models

Coding AI Models

17 models

Embedding Models

5 models

Speech-to-Text Models

9 models

Image Generation Models

9 models

Multilingual AI Models

52 models