~/runthismodel
daemon okbuild 5a3c91d00:00:00Z

Can M3 Max run Whisper Medium?

S

Yes — runs locally

~102 tok/sec · Instant — feels like typing. No noticeable delay.

Your VRAM
128 GB
Model size
0.77B
Best quant
Q8_0
VRAM needed
1.9 GB

The verdict

The M3 Max (128 GB VRAM) handles Whisper Medium comfortably using the Q8_0 quantization, which fits in 1.9 GB. Expected throughput is around 102 tokens/second, which feels Instant — feels like typing. No noticeable delay. in interactive use. Mid-size Whisper model. Strong multilingual speech recognition.

Setup tutorial: Whisper Medium on M3 Max

AI-generated, GPU-specific. Verified commands for your exact hardware.

TL;DR

Whisper Medium runs at Grade S on the Apple M3 Max with Q8_0 quantization, achieving ~1696 tok/sec.

Prerequisites

Before starting, ensure you have at least 1.4GB of free disk space, macOS 12.3 or later, and Xcode Command Line Tools installed. You can install Xcode CLT by running `xcode-select --install` in your terminal.

Expected performance

With the Q8_0 quantization, you can expect ~1696 tok/sec with 1.9GB of VRAM in use, leaving 126.1GB of VRAM for context. This should allow for a practical context window of several minutes of audio, depending on the complexity of the input.

1. Install runtimeOllama (preferred on Apple Silicon)

brew install ollama
ollama init

2. Download the model

Download the Q8_0 quantized Whisper Medium model (1.4GB file) from Hugging Face.

ollama pull ggerganov/whisper.cpp:ggml-medium.bin

3. Run it

ollama run ggerganov/whisper.cpp:ggml-medium.bin
ollama interactive ggerganov/whisper.cpp:ggml-medium.bin

4. Optimize for M3 Max

For optimal performance on the Apple M3 Max, leverage the Metal/MLX backend to utilize the 128GB unified memory efficiently. Ensure that MPS layers are enabled to take full advantage of the GPU's capabilities. The large VRAM allows for significant headroom even with the 1.9GB VRAM usage of the Q8_0 quantized model.

Troubleshooting

Low performance or high CPU usage

Ensure that the Metal/MLX backend is enabled and that MPS layers are utilized. Run `ollama config set backend metal` to set the backend.

Out of memory errors

Reduce the batch size or context length. You can adjust these settings using `ollama config set batch_size <value>` and `ollama config set context_length <value>`. Start with smaller values and increase gradually.

Model not found

Verify that the model was successfully downloaded and is available in the Ollama model directory. Run `ollama list` to check the available models.

Alternative runtimes

While Ollama is the preferred runtime for Apple Silicon, you can also use LM Studio, llama.cpp, or MLX. LM Studio provides a graphical interface and is useful for users who prefer a GUI. llama.cpp is a lightweight alternative for more advanced users, and MLX offers additional customization options. Choose based on your specific needs and comfort level with command-line tools.

Other models that run great on M3 Max

FAQ (20)

What GPU do I need to run Whisper Medium?

To run Whisper Medium, you need a GPU with at least 1.9 GB of VRAM. NVIDIA GPUs such as the GTX 1060 or higher are recommended for optimal performance.

Is Whisper Medium good for coding?

Whisper Medium is primarily designed for speech recognition and is not optimized for coding tasks. For coding, models like Codex or CodeLlama are more suitable.

Whisper Medium vs Llama 3.1 8B?

Whisper Medium has 0.77 billion parameters and is specialized for speech recognition, while Llama 3.1 8B has 8 billion parameters and is a general-purpose language model. Llama 3.1 8B is better for text generation but requires more resources.

Can I run Whisper Medium on a Mac?

Yes, you can run Whisper Medium on a Mac. Ensure your Mac has a compatible GPU with at least 1.9 GB of VRAM and the necessary drivers installed.

How much VRAM does Whisper Medium need?

Whisper Medium requires at least 1.9 GB of VRAM to run efficiently. This can vary slightly depending on the quantization level used.

Is Whisper Medium censored?

Whisper Medium is not censored. It is an open-source model released under the MIT license, allowing for unrestricted use and modification.

Is Whisper Medium commercial-use allowed?

Yes, Whisper Medium is licensed under the MIT license, which allows for commercial use without any restrictions.

Whisper Medium context length?

The context length for Whisper Medium is not explicitly defined, but it is designed to handle typical speech segments effectively. For longer audio, you may need to split the input into smaller chunks.

Want personalized recommendations for your exact setup? Detect my hardware →