Whisper Model Sizes: Which One Should You Use?
OpenAI's Whisper is the gold standard for speech-to-text. It comes in 6 sizes, and choosing the right one depends on your hardware, speed requirements, and accuracy needs.
Model Lineup
| Model | Parameters | VRAM | Speed vs Large | Accuracy |
|---|---|---|---|---|
| Tiny | 39M | ~1GB | 10x faster | Basic |
| Base | 74M | ~1GB | 7x faster | Good |
| Small | 244M | ~2GB | 4x faster | Very Good |
| Medium | 769M | ~5GB | 2x faster | Excellent |
| Turbo | 809M | ~6GB | 8x faster | Near-Large |
| Large-v3 | 1.55B | ~10GB | Baseline | Best |
The Turbo Sweet Spot
Whisper Turbo is the standout option for most users. It delivers 95% of Large-v3's accuracy at 8x the speed, using only 6GB VRAM. Unless you're processing critical transcriptions where every word matters, Turbo is the recommended choice.
Hardware Recommendations
Smartphones & Low-end devices: Tiny or Base — runs on CPU efficiently.
Laptops with integrated graphics: Small — 2GB VRAM is widely available.
Desktop with dedicated GPU: Turbo — the best balance of speed and quality.
Workstation / Server: Large-v3 — maximum accuracy for production pipelines.
All Whisper models can also run on CPU using whisper.cpp, which is especially useful for the smaller models. Check compatibility for your device on our model checker.