~/runthismodel
daemon okbuild 5a3c91d00:00:00Z
./models/browse/whisper-medium
OpenAI · speech
Whisper Medium
Mid-size Whisper model. Strong multilingual speech recognition.
0.77b paramswhispermit1.931.93 GB vram
about·model card

Whisper Medium is an automatic speech recognition (ASR) model developed by OpenAI, boasting 0.77 billion parameters. This model is particularly adept at transcribing audio content with high accuracy, making it a solid choice for tasks such as converting spoken language into text, creating subtitles, or transcribing meetings and lectures. Its performance is notable for its balance between accuracy and computational efficiency, which is crucial for local deployment where resources might be more limited.

In the context of its size class, Whisper Medium punches well above its weight. Despite having fewer parameters than larger models like Whisper Large, it maintains a high level of accuracy, often comparable to more resource-intensive models. This makes it an excellent option for users who need robust ASR capabilities without the overhead of extensive computational resources. The model’s efficiency is further enhanced by its available quantization (Q8_0), which reduces the memory footprint and improves inference speed, making it suitable for deployment on devices with as little as 1.9 GB of VRAM.

Whisper Medium is ideal for developers and professionals looking to integrate ASR into applications running on mid-range hardware, such as laptops or edge devices. It is particularly useful for projects that require real-time or near-real-time transcription, where latency and resource consumption are critical factors. Users with more powerful hardware will also benefit from its efficiency, allowing them to handle multiple streams or larger datasets simultaneously.

probe://hardware·which quants fit your rig
we auto-detect via WebGL/WebGPU. select manually if your GPU isn't recognized.
./quantizations·1 variants
QuantizationBitsFile SizeVRAM NeededRAM NeededQuality
Q8_081.428 GB1.93 GB2.43 GB
92%

How to run Whisper Medium

Pick a runtime — copy & paste. Commands are pre-filled with this model’s repo.

Pure-C reimplementation. CoreML/Metal/CUDA. 1-line setup.

whisper.cpp home →
  1. 1

    Build

    git clone https://github.com/ggerganov/whisper.cpp && cd whisper.cpp && make
  2. 2

    Get the model

    bash ./models/download-ggml-model.sh medium
  3. 3

    Transcribe

    ./main -m models/ggml-medium.bin -f input.wav

Community benchmarks

Real tokens/sec reports from people running Whisper Medium on actual hardware.

No community runs yet for this model. Be the first to submit your numbers.

faq·common questions
how much VRAM do I need to run Whisper Medium?

Whisper Medium requires 1.93 GB VRAM minimum with Q8_0 quantization. For full precision you need 1.93 GB.

which quant should I pick?

Q4_K_M is the best quality/VRAM balance — ~92% of FP16 quality at ~25% the footprint. Q8_0 is near-lossless if you have the headroom.