Whisper Large v3 Turbo: 8x Faster Speech Recognition
OpenAI has released Whisper Large v3 Turbo, a distilled variant of the Whisper Large v3 speech recognition model that achieves dramatic speed improvements with minimal accuracy loss. The model is roughly one-quarter the size of Large v3 and processes audio approximately 8 times faster.
Technical details
Whisper Large v3 Turbo uses a distillation approach where the decoder is compressed from 32 layers to 4 layers while the encoder remains largely intact. The resulting model has approximately 809 million parameters compared to Large v3's 1.55 billion. The distillation preserves the encoder's acoustic modeling capability while streamlining the text generation component.
Accuracy comparison
In comprehensive testing across the Fleurs benchmark covering 100 languages, Turbo retains over 99 percent of Large v3's word error rate performance on English and over 97 percent on average across all languages. The gap is most noticeable on low-resource languages with limited training data, where the full Large v3 still has an edge. For English, French, Spanish, German, Chinese, and Japanese, the difference is effectively imperceptible.
Local inference performance
The speed improvement is the headline feature. On an RTX 3060 with 12GB VRAM, Turbo transcribes a one-hour audio file in approximately 2 minutes, compared to roughly 16 minutes for Large v3. On Apple Silicon, an M2 MacBook Air processes the same file in about 4 minutes. The model fits in 2GB of VRAM in Q5 quantization, making it accessible on virtually any modern GPU.
Use cases
The speed improvement makes Turbo the default choice for most users. Real-time transcription of meetings, lectures, and podcasts is now practical on consumer hardware. The model also enables batch processing of large audio archives in reasonable timeframes. Large v3 remains the better choice only when maximum accuracy on rare languages is critical.