Best voice models (STT + TTS)

Whisper + Piper + Kokoro

Speech-in / speech-out building blocks for offline voice assistants. Pair Whisper for STT with Piper or Kokoro for TTS.

  1. 1

    OpenAI

    Whisper Large v3 Turbo

    Optimized large Whisper model. Near-best accuracy with faster inference.

    0.81B2.01 GB
  2. 2

    OpenAI

    Whisper Large v3

    Largest Whisper model. Best accuracy across all languages and accents.

    1.55B3.38 GB
  3. 3

    OpenAI

    Whisper Medium

    Mid-size Whisper model. Strong multilingual speech recognition.

    0.77B1.93 GB
  4. 4

    OpenAI

    Whisper Small

    Compact Whisper model. Good accuracy for everyday transcription tasks.

    0.24B0.95 GB
  5. 5

    HuggingFace

    Distil-Whisper Large v3

    Distilled Whisper. 6x faster than large-v3 with 1% accuracy loss.

    0.76B1.92 GB
  6. 6

    Kokoro

    Kokoro 82M TTS

    High quality 82M parameter TTS model. Excellent speech synthesis with multiple voice options. 86MB download.

    0.082B0.58 GB
  7. 7

    Rhasspy

    Piper TTS - LibriTTS-R (English)

    Medium quality English voice with natural prosody. 63MB download.

    0.02B0.57 GB

Not sure which fits your machine? Auto-detect your hardware →