~/runthismodel
daemon okbuild 5a3c91d00:00:00Z
./models/browse/distil-whisper-large-v3
HuggingFace · speech
Distil-Whisper Large v3
Distilled Whisper. 6x faster than large-v3 with 1% accuracy loss.
0.76b paramswhispermit1.921.92 GB vram
about·model card

Distil-Whisper Large v3, developed by HuggingFace, is an efficient and compact version of the popular Whisper architecture, designed for automatic speech recognition (ASR). With 0.76 billion parameters, this model offers a compelling balance between performance and resource requirements, making it suitable for a wide range of applications, from real-time transcription to voice-controlled interfaces. It excels in recognizing and transcribing spoken words with high accuracy, even in noisy environments, thanks to its robust training on diverse datasets.

In its size class, Distil-Whisper Large v3 punches well above its weight. Despite being significantly smaller than some of its full-sized counterparts, it maintains a high level of accuracy and efficiency. This makes it particularly appealing for users who need powerful ASR capabilities but have limited computational resources. The model’s quantization options, such as Q8_0, further enhance its efficiency, requiring only 1.9 GB of VRAM, which is manageable even on mid-range GPUs and some high-end CPUs.

This model is ideal for developers and hobbyists looking to integrate ASR into their projects without the overhead of cloud services. Realistic hardware for running Distil-Whisper Large v3 includes modern laptops with dedicated GPUs, high-end desktops, and edge devices with sufficient RAM and processing power. Its low VRAM requirement and efficient quantization make it accessible to a broader audience, ensuring that it can be deployed in a variety of settings, from personal projects to small-scale commercial applications.

probe://hardware·which quants fit your rig
we auto-detect via WebGL/WebGPU. select manually if your GPU isn't recognized.
./quantizations·1 variants
QuantizationBitsFile SizeVRAM NeededRAM NeededQuality
Q8_081.415 GB1.92 GB2.42 GB
96%

How to run Distil-Whisper Large v3

Pick a runtime — copy & paste. Commands are pre-filled with this model’s repo.

Pure-C reimplementation. CoreML/Metal/CUDA. 1-line setup.

whisper.cpp home →
  1. 1

    Build

    git clone https://github.com/ggerganov/whisper.cpp && cd whisper.cpp && make
  2. 2

    Get the model

    bash ./models/download-ggml-model.sh large-v3
  3. 3

    Transcribe

    ./main -m models/ggml-large-v3.bin -f input.wav

Community benchmarks

Real tokens/sec reports from people running Distil-Whisper Large v3 on actual hardware.

No community runs yet for this model. Be the first to submit your numbers.

faq·common questions
how much VRAM do I need to run Distil-Whisper Large v3?

Distil-Whisper Large v3 requires 1.92 GB VRAM minimum with Q8_0 quantization. For full precision you need 1.92 GB.

which quant should I pick?

Q4_K_M is the best quality/VRAM balance — ~92% of FP16 quality at ~25% the footprint. Q8_0 is near-lossless if you have the headroom.

faq://ai-curated·20 entries
What GPU do I need to run Distil-Whisper Large v3?

To run Distil-Whisper Large v3, you need a GPU with at least 1.9 GB of VRAM. NVIDIA GPUs such as the GTX 1060 or higher are recommended.

Is Distil-Whisper Large v3 good for coding?

Distil-Whisper Large v3 is primarily designed for speech recognition tasks and may not be optimized for coding-specific tasks. For coding, models like Codex or CodeLlama are more suitable.

Distil-Whisper Large v3 vs Llama 3.1 8B?

Distil-Whisper Large v3 has 0.76B parameters and is optimized for speech recognition, while Llama 3.1 8B is a larger, more versatile model with 8B parameters, better suited for a wider range of NLP tasks.

Can I run Distil-Whisper Large v3 on a Mac?

Yes, you can run Distil-Whisper Large v3 on a Mac, but ensure your Mac has a compatible GPU with at least 1.9 GB of VRAM. M1 and later Macs with Metal support are recommended.

How much VRAM does Distil-Whisper Large v3 need?

Distil-Whisper Large v3 requires 1.9 GB of VRAM, which is consistent across different quantization levels.

Is Distil-Whisper Large v3 censored?

No, Distil-Whisper Large v3 is not censored. It is an open-source model under the MIT license, allowing for unrestricted use and modification.

Is Distil-Whisper Large v3 commercial-use allowed?

Yes, Distil-Whisper Large v3 is licensed under the MIT license, which allows for commercial use without restrictions.

Distil-Whisper Large v3 context length?

The context length for Distil-Whisper Large v3 is currently unknown. For more detailed information, refer to the model's documentation or source code.

Does Distil-Whisper Large v3 support function calling?

Distil-Whisper Large v3 is primarily designed for speech recognition and does not natively support function calling. For such features, consider using a more versatile model like LLMs with function-calling capabilities.

Distil-Whisper Large v3 quantization options?

Distil-Whisper Large v3 supports quantization to reduce memory usage and improve inference speed. Common quantization options include INT8 and FP16.

Can Distil-Whisper Large v3 run on CPU?

Yes, Distil-Whisper Large v3 can run on CPU, but performance will be significantly slower compared to running on a GPU. A powerful multi-core CPU is recommended for better performance.

Distil-Whisper Large v3 fine-tuning?

Distil-Whisper Large v3 can be fine-tuned for specific speech recognition tasks. Fine-tuning typically requires a labeled dataset and a training framework like PyTorch or TensorFlow.

Distil-Whisper Large v3 system requirements?

To run Distil-Whisper Large v3, you need a system with at least 1.9 GB of VRAM, 8 GB of RAM, and a multi-core CPU. A dedicated GPU is highly recommended for optimal performance.

Distil-Whisper Large v3 performance benchmark?

Distil-Whisper Large v3 is 6 times faster than the original large-v3 model with only a 1% accuracy loss. Inference speed can vary based on hardware and quantization level.

Distil-Whisper Large v3 for RAG?

Distil-Whisper Large v3 is not designed for Retrieval-Augmented Generation (RAG). It is optimized for speech recognition tasks and may not perform well in RAG scenarios.

Distil-Whisper Large v3 for agents?

Distil-Whisper Large v3 can be used in agent-based systems for speech recognition tasks, such as voice commands or transcriptions. However, it is not designed for complex dialog management or natural language understanding.

Distil-Whisper Large v3 for coding vs general?

Distil-Whisper Large v3 is optimized for speech recognition and is not specifically designed for coding or general-purpose NLP tasks. For coding, models like Codex are more appropriate.

Distil-Whisper Large v3 vs ChatGPT?

Distil-Whisper Large v3 is a speech recognition model, while ChatGPT is a conversational AI model. They serve different purposes and are not directly comparable in terms of functionality.

Distil-Whisper Large v3 download size?

The download size of Distil-Whisper Large v3 is approximately 1.5 GB, depending on the quantization level and format.

Best quant for Distil-Whisper Large v3?

The best quantization for Distil-Whisper Large v3 depends on your specific needs. INT8 is generally a good balance between performance and memory usage, while FP16 offers a slight accuracy improvement with a higher memory footprint.