Best voice models (STT + TTS)
Whisper + Piper + Kokoro
Speech-in / speech-out building blocks for offline voice assistants. Pair Whisper for STT with Piper or Kokoro for TTS.
- 1
OpenAI
Whisper Large v3 Turbo
Optimized large Whisper model. Near-best accuracy with faster inference.
0.81B≥ 2.01 GB - 2
OpenAI
Whisper Large v3
Largest Whisper model. Best accuracy across all languages and accents.
1.55B≥ 3.38 GB - 3
OpenAI
Whisper Medium
Mid-size Whisper model. Strong multilingual speech recognition.
0.77B≥ 1.93 GB - 4
OpenAI
Whisper Small
Compact Whisper model. Good accuracy for everyday transcription tasks.
0.24B≥ 0.95 GB - 5
HuggingFace
Distil-Whisper Large v3
Distilled Whisper. 6x faster than large-v3 with 1% accuracy loss.
0.76B≥ 1.92 GB - 6
Kokoro
Kokoro 82M TTS
High quality 82M parameter TTS model. Excellent speech synthesis with multiple voice options. 86MB download.
0.082B≥ 0.58 GB - 7
Rhasspy
Piper TTS - LibriTTS-R (English)
Medium quality English voice with natural prosody. 63MB download.
0.02B≥ 0.57 GB
Not sure which fits your machine? Auto-detect your hardware →