~/runthismodel
daemon okbuild 5a3c91d00:00:00Z
./models/browse/whisper-base-en
OpenAI · speech
Whisper Base English
English-only base model. Faster and more accurate for English.
0.074b paramswhispermit0.30.3 GB vram
about·model card

Whisper Base English is a compact automatic speech recognition (ASR) model developed by OpenAI, designed specifically for English language speech-to-text tasks. With only 74 million parameters, this model is remarkably lightweight, making it an efficient choice for devices with limited computational resources. Despite its small size, Whisper Base English delivers impressive performance, capable of transcribing speech with a high degree of accuracy, making it suitable for a wide range of applications such as real-time transcription, voice assistants, and content creation tools. It is particularly noteworthy for its ability to handle various accents and speech patterns, though it may not match the precision of larger, more complex models in highly specialized or noisy environments.

In its size class, Whisper Base English stands out for its efficiency and performance balance. It punches above its weight by offering reliable ASR capabilities without the need for high-end hardware. The model requires only 0.3 GB of VRAM, which means it can run smoothly on a variety of devices, including older laptops, low-end GPUs, and even some edge devices. This makes it an excellent choice for developers and users who need a robust ASR solution but have constraints on computational resources. Ideal use cases include small-scale projects, educational applications, and environments where power and processing efficiency are critical.

probe://hardware·which quants fit your rig
we auto-detect via WebGL/WebGPU. select manually if your GPU isn't recognized.
./quantizations·1 variants
QuantizationBitsFile SizeVRAM NeededRAM NeededQuality
Q8_080.142 GB0.3 GB0.6 GB
82%

How to run Whisper Base English

Pick a runtime — copy & paste. Commands are pre-filled with this model’s repo.

Pure-C reimplementation. CoreML/Metal/CUDA. 1-line setup.

whisper.cpp home →
  1. 1

    Build

    git clone https://github.com/ggerganov/whisper.cpp && cd whisper.cpp && make
  2. 2

    Get the model

    bash ./models/download-ggml-model.sh base.en
  3. 3

    Transcribe

    ./main -m models/ggml-base.en.bin -f input.wav

Community benchmarks

Real tokens/sec reports from people running Whisper Base English on actual hardware.

No community runs yet for this model. Be the first to submit your numbers.

faq·common questions
how much VRAM do I need to run Whisper Base English?

Whisper Base English requires 0.3 GB VRAM minimum with Q8_0 quantization. For full precision you need 0.3 GB.

which quant should I pick?

Q4_K_M is the best quality/VRAM balance — ~92% of FP16 quality at ~25% the footprint. Q8_0 is near-lossless if you have the headroom.

faq://ai-curated·20 entries
What GPU do I need to run Whisper Base English?

To run Whisper Base English, you need a GPU with at least 0.3 GB of VRAM. Most modern GPUs should meet this requirement.

Is Whisper Base English good for coding?

Whisper Base English is primarily designed for speech recognition and transcription. It may not be suitable for coding tasks, which typically require text generation or code understanding models.

Whisper Base English vs Llama 3.1 8B?

Whisper Base English has only 0.074 billion parameters, making it much smaller and faster than Llama 3.1 8B, which has 8 billion parameters. Whisper Base English is optimized for speech recognition, while Llama 3.1 8B is better suited for general language tasks.

Can I run Whisper Base English on a Mac?

Yes, you can run Whisper Base English on a Mac. Ensure your Mac has a compatible GPU with at least 0.3 GB of VRAM for optimal performance.

How much VRAM does Whisper Base English need?

Whisper Base English requires 0.3 GB of VRAM to run efficiently. This is a relatively low requirement, making it accessible on most modern GPUs.

Is Whisper Base English censored?

No, Whisper Base English is not censored. It is an open-source model released under the MIT license, allowing for unrestricted use and modification.

Is Whisper Base English commercial-use allowed?

Yes, Whisper Base English is licensed under the MIT license, which allows for commercial use without restrictions.

Whisper Base English context length?

The context length for Whisper Base English is unknown, but it is designed to handle continuous speech input effectively.

Does Whisper Base English support function calling?

Whisper Base English does not support function calling as it is primarily a speech recognition model. Function calling is more relevant to models designed for text generation and interactive tasks.

Whisper Base English quantization options?

Whisper Base English supports quantization, which can reduce its memory footprint and improve inference speed. Common quantization options include INT8 and FP16.

Can Whisper Base English run on CPU?

Yes, Whisper Base English can run on a CPU, but it will be significantly slower compared to running on a GPU. A powerful CPU is recommended for acceptable performance.

Whisper Base English fine-tuning?

Whisper Base English can be fine-tuned on specific datasets to improve its performance for particular tasks or domains. Fine-tuning requires a dataset of labeled speech data and appropriate training resources.

Whisper Base English system requirements?

To run Whisper Base English, you need a system with at least 0.3 GB of VRAM, a modern CPU, and sufficient RAM. A GPU is recommended for optimal performance.

Whisper Base English performance benchmark?

Whisper Base English processes speech at a rate of approximately 16-24 tokens per second on a mid-range GPU, making it suitable for real-time transcription tasks.

Whisper Base English for RAG?

Whisper Base English is not designed for Retrieval-Augmented Generation (RAG). It is primarily a speech recognition model and does not have the capabilities needed for RAG tasks.

Whisper Base English for agents?

Whisper Base English can be used in conversational agents for speech-to-text conversion, but it does not generate responses or engage in dialogue. It is best used in conjunction with other models for complete agent functionality.

Whisper Base English for coding vs general?

Whisper Base English is not ideal for coding tasks, which often require text generation and code understanding. It is better suited for general speech recognition and transcription tasks.

Whisper Base English vs ChatGPT?

Whisper Base English is a speech recognition model, while ChatGPT is a text-based language model designed for conversation and text generation. They serve different purposes and are not directly comparable.

Whisper Base English download size?

The download size for Whisper Base English is approximately 150 MB, making it a relatively lightweight model to download and store.

Best quant for Whisper Base English?

The best quantization option for Whisper Base English depends on your specific needs. INT8 quantization reduces the model size and improves inference speed, while FP16 offers a balance between performance and accuracy.