Best voice models (STT + TTS)

Whisper + Piper + Kokoro

Speech-in / speech-out building blocks for offline voice assistants. Pair Whisper for STT with Piper or Kokoro for TTS.

1
OpenAI
Whisper Large v3 Turbo
Optimized large Whisper model. Near-best accuracy with faster inference.
0.81B≥ 2.01 GB
2
OpenAI
Whisper Large v3
Largest Whisper model. Best accuracy across all languages and accents.
1.55B≥ 3.38 GB
3
OpenAI
Whisper Medium
Mid-size Whisper model. Strong multilingual speech recognition.
0.77B≥ 1.93 GB
4
OpenAI
Whisper Small
Compact Whisper model. Good accuracy for everyday transcription tasks.
0.24B≥ 0.95 GB
5
HuggingFace
Distil-Whisper Large v3
Distilled Whisper. 6x faster than large-v3 with 1% accuracy loss.
0.76B≥ 1.92 GB
6
Kokoro
Kokoro 82M TTS
High quality 82M parameter TTS model. Excellent speech synthesis with multiple voice options. 86MB download.
0.082B≥ 0.58 GB
7
Rhasspy
Piper TTS - LibriTTS-R (English)
Medium quality English voice with natural prosody. 63MB download.
0.02B≥ 0.57 GB

Not sure which fits your machine? Auto-detect your hardware →

© runthismodel · 2026privacy terms disclaimer contact editorial standards changelog embed badge runpod vast.ai huggingface ollama lm-studiomade for the people who actually read GGUF metadata

 ┌─┐                ╔══╗     ╔══╗
 │░│  RUN  THIS  M  ║▓▓║ DEL ║▓▓║
 └─┘                ╚══╝     ╚══╝