AI Models

Browse 109+ AI models. Filter by your hardware to see what you can run.

Featured

OpenAI

Whisper Tiny English (Quantized)

Smallest possible speech recognition model. Only 32MB. English only. Default speech model - guaranteed to run on any iPhone.

🎤 Speech0.039B0.1GB+ VRAM
Featured

Kokoro

Kokoro 82M TTS

High quality 82M parameter TTS model. Excellent speech synthesis with multiple voice options. 86MB download.

🔊 TTS0.082B0.2GB+ VRAM
Featured

Nomic AI

Nomic Embed Text v1.5

High quality text embedding model. 137M params. Good for RAG and search.

🔗 Embed0.137B0.3GB+ VRAM
Featured

HuggingFace

Distil-Whisper Large v3

Distilled Whisper. 6x faster than large-v3 with 1% accuracy loss.

🎤 Speech0.76B1GB+ VRAM
Featured

OpenAI

Whisper Medium

Mid-size Whisper model. Strong multilingual speech recognition.

🎤 Speech0.77B1.8GB+ VRAM
Featured

OpenAI

Whisper Large v3 Turbo

Optimized large Whisper model. Near-best accuracy with faster inference.

🎤 Speech0.81B1.9GB+ VRAM
Featured

Stability AI / Apple

Stable Diffusion 2.1 Base (CoreML)

Smallest CoreML image generation model. Palettized for minimal size (1.14GB). Runs on any iPhone with 6GB RAM. Default image generation model.

🎨 Image Gen0.86B2GB+ VRAM
Featured

Runway / GPUStack

Stable Diffusion 1.5 (GGUF)

SD 1.5 in single-file GGUF format. Alternative to CoreML. Uses stable-diffusion.cpp with Metal acceleration.

🎨 Image Gen0.86B2.5GB+ VRAM
Featured

Google

Gemma 3 1B

Google's latest tiny 1B model. Excellent quality for its size.

💬 Chat1B1.2GB+ VRAM
Featured

Meta

Llama 3.2 1B Instruct

Ultra-compact 1B model. Runs on virtually any device including smartphones.

💬 Chat1.24B1.3GB+ VRAM
Featured

OpenAI

Whisper Large v3

Largest Whisper model. Best accuracy across all languages and accents.

🎤 Speech1.55B3.5GB+ VRAM
Featured

Moondream

Moondream 2

Ultra-compact vision model. Only 1GB. Answers questions about images.

👁️ Vision1.8B1.5GB+ VRAM
Featured

Stability AI

Stable Diffusion 3 Medium (GGUF)

SD 3 with MMDiT architecture. Superior text rendering.

🎨 Image Gen2.5B5.5GB+ VRAM
Featured

Meta

Llama 3.2 3B Instruct

Meta's compact 3B model designed for edge and mobile deployment.

💬 Chat3.2B2.6GB+ VRAM
Featured

Stability AI

Stable Diffusion XL (CoreML)

Higher quality image generation. CoreML optimized for iOS. Requires 6GB+ usable memory (iPad/Mac recommended).

🎨 Image Gen3.5B5GB+ VRAM
Featured

Stability AI

SDXL Turbo (GGUF)

Single-step SDXL. Near-instant image generation.

🎨 Image Gen3.5B5GB+ VRAM
Featured

Microsoft

Phi-3.5 Mini 3.8B

Tiny but capable 3.8B model. Runs on almost any hardware including phones.

💬 Chat3.8B3GB+ VRAM
Featured

Microsoft

Phi-4 Mini 3.8B

Latest Phi mini with strong reasoning. Drop-in upgrade from Phi-3.5 Mini.

💬 Chat3.8B3GB+ VRAM
Featured

Google

Gemma 3 4B

Balanced 4B model with strong reasoning. Great for iPhones.

💬 Chat4B3.2GB+ VRAM
Featured

Microsoft

Phi-3.5 Vision

Vision-language model from Microsoft. Can understand images and documents.

👁️ Vision4.2B3.2GB+ VRAM
Featured

LLaVA

LLaVA 1.6 7B

Multimodal vision-language model. Understands images and answers questions about them.

👁️ Vision7B5GB+ VRAM
Featured

Mistral AI

Mistral 7B Instruct v0.3

Efficient 7B model from Mistral AI with strong performance for its size.

💬 Chat7.3B5GB+ VRAM
Featured

Alibaba

Qwen 2.5 7B Instruct

Efficient 7B model with strong coding and reasoning abilities.

💬 Chat7.6B5.3GB+ VRAM
Featured

Alibaba

Qwen 2.5 Coder 7B

Strong 7B code model rivaling larger coding models. Excellent for local development.

💻 Code7.6B5.3GB+ VRAM
Featured

DeepSeek

DeepSeek R1 Distill 8B

Compact reasoning model. Good reasoning capabilities in a small package.

💬 Chat8B5.5GB+ VRAM
Featured

Meta

Llama 3.1 8B Instruct

Meta's 8B parameter instruction-tuned model. Great balance of performance and efficiency for local deployment.

💬 Chat8B5.5GB+ VRAM
Featured

Google

Gemma 2 9B Instruct

Google's efficient 9B model. Great performance-to-size ratio.

💬 Chat9.2B6.2GB+ VRAM
Featured

Google

Gemma 3 12B

High quality 12B model. Excellent for iPad Pro and Mac.

💬 Chat12B8GB+ VRAM
Featured

Mistral AI

Mistral Nemo 12B

Mistral's 12B model with excellent instruction following.

💬 Chat12B8GB+ VRAM
Featured

Black Forest Labs

FLUX.1 Schnell (GGUF)

Fast 1-4 step generation. State-of-the-art quality. Needs 16GB+ RAM.

🎨 Image Gen12B14GB+ VRAM
Featured

Microsoft

Phi-4

Microsoft's 14B parameter model. Punches well above its weight on reasoning.

💬 Chat14B9.5GB+ VRAM
Featured

Alibaba

Qwen 2.5 14B

Strong 14B model with excellent coding and reasoning. iPad Pro recommended.

💬 Chat14B10GB+ VRAM
Featured

Alibaba

Qwen 2.5 Coder 14B

Powerful 14B code model. Excellent for complex programming tasks.

💻 Code14B10GB+ VRAM
Featured

Google

Gemma 3 27B

Google's flagship open model. Near GPT-4 quality. Needs 20GB+ RAM.

💬 Chat27B17.5GB+ VRAM
Featured

Meta

Llama 3.1 70B Instruct

Meta's flagship 70B parameter model. Excellent performance rivaling GPT-4 on many benchmarks.

💬 Chat70B42GB+ VRAM

Rhasspy

Piper TTS - Amy (English)

Lightweight TTS voice. High quality English speech synthesis. Default TTS model - runs on any iPhone. Only 63MB.

🔊 TTS0.02B0.15GB+ VRAM

Rhasspy

Piper TTS - Lessac (English)

High quality English male voice. 63MB download. Runs on any device.

🔊 TTS0.02B0.15GB+ VRAM

Rhasspy

Piper TTS - LibriTTS-R (English)

Medium quality English voice with natural prosody. 63MB download.

🔊 TTS0.02B0.15GB+ VRAM

Rhasspy

Piper TTS - Spanish (MLS)

Spanish female voice. Natural prosody.

🔊 TTS0.02B0.15GB+ VRAM

Rhasspy

Piper TTS - French (Siwis)

French female voice.

🔊 TTS0.02B0.15GB+ VRAM

Rhasspy

Piper TTS - German (Thorsten)

German male voice.

🔊 TTS0.02B0.15GB+ VRAM

Rhasspy

Piper TTS - Chinese (Huayan)

Chinese Mandarin voice.

🔊 TTS0.02B0.15GB+ VRAM

Rhasspy

Piper TTS - Japanese (Kokoro)

Japanese voice.

🔊 TTS0.02B0.15GB+ VRAM

Rhasspy

Piper TTS - Korean

Korean voice.

🔊 TTS0.02B0.15GB+ VRAM

Rhasspy

Piper TTS - Russian (Irina)

Russian female voice.

🔊 TTS0.02B0.15GB+ VRAM

Rhasspy

Piper TTS - Portuguese (Faber)

Portuguese voice.

🔊 TTS0.02B0.15GB+ VRAM

Rhasspy

Piper TTS - Italian (Riccardo)

Italian male voice.

🔊 TTS0.02B0.15GB+ VRAM

Rhasspy

Piper TTS - Arabic (Kareem)

Arabic voice.

🔊 TTS0.02B0.15GB+ VRAM

Sentence Transformers

all-MiniLM-L6-v2

Tiny embedding model. Only 23MB. Perfect for on-device search.

🔗 Embed0.023B0.1GB+ VRAM

BAAI

BGE Small EN v1.5

Compact English embedding model. Good for basic semantic search.

🔗 Embed0.033B0.1GB+ VRAM

Snowflake

Snowflake Arctic Embed S

Compact embedding model from Snowflake. Good multilingual support.

🔗 Embed0.033B0.1GB+ VRAM

Jina AI

Jina Reranker Tiny EN

Tiny English reranker. Only 67MB. Use with embedding models for better search.

📊 Rerank0.033B0.15GB+ VRAM

OpenAI

Whisper Tiny

Tiny multilingual speech recognition. Only 75MB. Supports 99 languages. Runs on any device.

🎤 Speech0.039B0.2GB+ VRAM

OpenAI

Whisper Base

Base whisper model. Good balance of speed and accuracy. 142MB.

🎤 Speech0.074B0.3GB+ VRAM

OpenAI

Whisper Base English

English-only base model. Faster and more accurate for English.

🎤 Speech0.074B0.3GB+ VRAM

HuggingFace

SmolLM2 135M

Tiny 135M model. Default LLM - guaranteed to run on any iPhone. Only 145MB download. Perfect for quick experiments.

💬 Chat0.135B0.3GB+ VRAM

OpenAI

Whisper Small

Compact Whisper model. Good accuracy for everyday transcription tasks.

🎤 Speech0.24B0.6GB+ VRAM

Meta

MusicGen Small

Music generation from text prompts. Requires multiple ONNX files (~435MB total). Experimental iOS support.

🎵 Audio0.3B1.5GB+ VRAM

BAAI

BGE Large EN v1.5

High quality English embedding model. Best accuracy for English search.

🔗 Embed0.335B0.5GB+ VRAM

HuggingFace

SmolLM2 360M

Compact 360M model. Good for basic tasks on very constrained devices.

💬 Chat0.36B0.5GB+ VRAM

Alibaba

Qwen 2.5 0.5B

Ultra-small 0.5B model from Alibaba. Minimal resource requirements.

💬 Chat0.5B0.7GB+ VRAM

Alibaba

Qwen 2.5 Coder 0.5B

Smallest code model. Default code assistant - runs on any iPhone. Great for code completion and simple programming tasks.

💻 Code0.5B0.9GB+ VRAM

H2O.ai

Danube 3 500M

Ultra-tiny 500M model. Even smaller than SmolLM. Runs anywhere.

💬 Chat0.5B0.6GB+ VRAM

BAAI

BGE Reranker v2 M3

Multilingual reranker. 100+ languages. 1.1GB.

📊 Rerank0.568B1.5GB+ VRAM

Runway

Stable Diffusion 1.5 (CoreML)

Classic image generation model. Pre-converted to CoreML for iOS/Mac. Downloads as zip, auto-extracts.

🎨 Image Gen0.86B2.5GB+ VRAM

Stability AI

Stable Diffusion 2.1 (GGUF)

SD 2.1 in GGUF format. Better quality than 1.5.

🎨 Image Gen0.86B2.5GB+ VRAM

TII

Falcon 3 1B

Ultra-compact 1B model from Technology Innovation Institute.

💬 Chat1B1.2GB+ VRAM

TinyLlama

TinyLlama 1.1B

Lightweight 1.1B chat model based on Llama architecture. Great for phones.

💬 Chat1.1B1.2GB+ VRAM

DeepSeek

DeepSeek Coder 1.3B

Compact code model with strong coding capabilities. Great for mobile coding assistants.

💻 Code1.3B1.3GB+ VRAM

Alibaba

Qwen 2.5 1.5B

Compact 1.5B model with strong multilingual and coding abilities.

💬 Chat1.5B1.5GB+ VRAM

DeepSeek

DeepSeek R1 Distill 1.5B

Compact reasoning model distilled from DeepSeek R1. Strong chain-of-thought in a tiny package.

💬 Chat1.5B1.5GB+ VRAM

Alibaba

Qwen 2.5 Coder 1.5B

Compact code model with solid code generation and understanding abilities.

💻 Code1.5B1.5GB+ VRAM

01.AI

Yi Coder 1.5B

Tiny code model. Great for phones. Fast completions.

💻 Code1.5B1.5GB+ VRAM

HuggingFace

SmolLM2 1.7B

Capable 1.7B model from HuggingFace. Good balance for mobile devices.

💬 Chat1.7B1.6GB+ VRAM

Google

CodeGemma 2B

Lightweight code completion model from Google. Fast on-device code suggestions.

💻 Code2B1.9GB+ VRAM

OpenBMB

MiniCPM-V 2.6

Efficient multimodal model with strong image understanding. Optimized for edge devices.

👁️ Vision2B2.1GB+ VRAM

IBM

Granite 3.3 2B

IBM's compact 2B model. Good at following instructions.

💬 Chat2B1.8GB+ VRAM

Alibaba

Qwen2-VL 2B

Compact vision-language model. Default multimodal model. Can understand images and answer questions about them.

👁️ Vision2.2B2GB+ VRAM

LG AI

EXAONE 3.5 2.4B

Compact model from LG. Optimized for Korean and English.

💬 Chat2.4B2GB+ VRAM

Google

Gemma 2 2B

Google's compact 2.6B model. Efficient and capable for mobile use.

💬 Chat2.6B2.3GB+ VRAM

Alibaba

Qwen 2.5 3B

Versatile 3B model with strong reasoning and multilingual capabilities.

💬 Chat3B2.5GB+ VRAM

Alibaba

Qwen 2.5 Coder 3B

Capable 3B code model. Good balance of coding ability and resource usage.

💻 Code3B2.5GB+ VRAM

TII

Falcon 3 3B

Compact 3B Falcon model with good performance.

💬 Chat3B2.5GB+ VRAM

Stability AI

StableLM Zephyr 3B

Compact 3B model from Stability AI. Good chat quality for its size.

💬 Chat3B2.3GB+ VRAM

Pansophic

Rocket 3B

Fast 3B model tuned for helpful responses.

💬 Chat3B2.3GB+ VRAM

BigCode

StarCoder2 3B

Code completion model trained on The Stack v2. 600+ languages.

💻 Code3B2.4GB+ VRAM

Stability AI

Stable Code 3B

Compact code model with good completion quality.

💻 Code3B2.3GB+ VRAM

Google

PaliGemma 3B

Google's vision model. Strong at visual QA, captioning, and OCR.

👁️ Vision3B2.5GB+ VRAM

NVIDIA

Nemotron Mini 4B

NVIDIA's compact 4B model optimized for edge deployment.

💬 Chat4B3GB+ VRAM

H2O.ai

Danube 3 4B

Capable 4B model from H2O.ai. Good for phones.

💬 Chat4B3GB+ VRAM

01.AI

Yi 1.5 6B Chat

Efficient 6B bilingual (English/Chinese) model.

💬 Chat6B4.3GB+ VRAM

DeepSeek

DeepSeek Coder 6.7B

Powerful 6.7B code model with excellent code generation across many languages.

💻 Code6.7B4.7GB+ VRAM

TII

Falcon 3 7B

Full-size Falcon 3 with strong performance across benchmarks.

💬 Chat7B5GB+ VRAM

Allen AI

OLMo 2 7B

Fully open research model. Transparent training.

💬 Chat7B5GB+ VRAM

OpenChat

OpenChat 3.5 7B

Fine-tuned Mistral 7B for chat. Strong instruction following.

💬 Chat7B5GB+ VRAM

BigCode

StarCoder2 7B

Larger code model with better completions.

💻 Code7B5GB+ VRAM

Meta

Code Llama 7B

Meta's code-specialized Llama model. Good at code completion.

💻 Code7B4.7GB+ VRAM

Shanghai AI Lab

InternLM 2.5 7B

Strong 7B model from China. Good at tool use and math.

💬 Chat7.7B5.3GB+ VRAM

LG AI

EXAONE 3.5 7.8B

7.8B model from LG. Strong bilingual Korean/English.

💬 Chat7.8B5.5GB+ VRAM

IBM

Granite 3.3 8B

IBM's 8B instruction model. Enterprise quality.

💬 Chat8B5.5GB+ VRAM

Google

CodeGemma 7B

Google's instruction-tuned code model. Strong code generation and understanding.

💻 Code8.5B5.6GB+ VRAM

01.AI

Yi 1.5 9B Chat

9B bilingual model with strong reasoning.

💬 Chat9B6.2GB+ VRAM

01.AI

Yi Coder 9B

Strong 9B code model with good reasoning.

💻 Code9B6.2GB+ VRAM

TII

Falcon 3 10B

10B Falcon model. Good iPad model.

💬 Chat10B7GB+ VRAM

Upstage

Solar 10.7B

Depth-upscaled 10.7B model. Strong reasoning.

💬 Chat10.7B7.2GB+ VRAM

Black Forest Labs

FLUX.1 Dev (GGUF)

Highest quality FLUX model. 20-50 steps. Mac with 24GB+ RAM.

🎨 Image Gen12B14GB+ VRAM

Meta

Code Llama 13B Instruct

13B code model for complex tasks. iPad Pro recommended.

💻 Code13B8.7GB+ VRAM

Mistral AI

Mistral Small 22B

22B parameter model. Strong reasoning and multilingual. Needs 16GB+ RAM.

💬 Chat22B14.5GB+ VRAM

Alibaba

Qwen 2.5 32B

Premium 32B model. Top-tier reasoning. Mac with 32GB+ RAM.

💬 Chat32B20GB+ VRAM