Speech-to-Text Models
Speech-to-text models transcribe audio into text, enabling applications like meeting transcription, subtitle generation, voice commands, and podcast indexing. The Whisper family from OpenAI dominates this space with models ranging from tiny (39M) to large (1.5B), offering tradeoffs between speed and accuracy. Distil-Whisper provides a faster alternative with minimal quality loss. All models support running fully locally for maximum privacy.
OpenAI
Whisper Large v3 Turbo
Optimized large Whisper model. Near-best accuracy with faster inference.
OpenAI
Whisper Large v3
Largest Whisper model. Best accuracy across all languages and accents.
OpenAI
Whisper Small
Compact Whisper model. Good accuracy for everyday transcription tasks.
OpenAI
Whisper Base
Base whisper model. Good balance of speed and accuracy. 142MB.
HuggingFace
Distil-Whisper Large v3
Distilled Whisper. 6x faster than large-v3 with 1% accuracy loss.
OpenAI
Whisper Tiny
Tiny multilingual speech recognition. Only 75MB. Supports 99 languages. Runs on any device.
OpenAI
Whisper Medium
Mid-size Whisper model. Strong multilingual speech recognition.
OpenAI
Whisper Tiny English (Quantized)
Smallest possible speech recognition model. Only 32MB. English only. Default speech model - guaranteed to run on any iPhone.
OpenAI
Whisper Base English
English-only base model. Faster and more accurate for English.