Speech-to-Text Models

Speech-to-text models transcribe audio into text, enabling applications like meeting transcription, subtitle generation, voice commands, and podcast indexing. The Whisper family from OpenAI dominates this space with models ranging from tiny (39M) to large (1.5B), offering tradeoffs between speed and accuracy. Distil-Whisper provides a faster alternative with minimal quality loss. All models support running fully locally for maximum privacy.

9models available
0.1GB min VRAM needed

Browse Other Capabilities