~/runthismodel
daemon okbuild 5a3c91d00:00:00Z
./models/browse/qwen2.5-coder-1.5b-instruct
Alibaba · code
Qwen 2.5 Coder 1.5B
Compact code model with solid code generation and understanding abilities.
1.5b paramsqwen2apache-2.032K ctx1.542.26 GB vram
about·model card

Qwen 2.5 Coder 1.5B is a code generation model developed by Alibaba, designed to assist developers with writing and generating code. With 1.5 billion parameters, this model excels in providing context-aware code suggestions, completing code snippets, and even generating entire functions or scripts based on user prompts. Its context length of 32,768 tokens allows it to maintain a broad understanding of the codebase, making it particularly useful for complex coding tasks where context is crucial. The model is licensed under Apache-2.0, making it accessible for both personal and commercial projects.

In its size class, Qwen 2.5 Coder 1.5B holds its own, offering a balance between performance and resource efficiency. While it may not match the capabilities of larger models in terms of depth and breadth of knowledge, it provides a solid alternative for users who have limited computational resources. The model is available in quantized versions (Q4_K_M and Q8_0), which further enhance its efficiency, requiring only 1.5 to 2.3 GB of VRAM. This makes it an excellent choice for developers working on laptops or desktops with modest GPU capabilities. Ideal users include software engineers, hobbyists, and small teams looking for a reliable code generation tool that doesn't strain their hardware.

probe://hardware·which quants fit your rig
we auto-detect via WebGL/WebGPU. select manually if your GPU isn't recognized.
./quantizations·2 variants
QuantizationBitsFile SizeVRAM NeededRAM NeededQuality
Q4_K_M4.51.041 GB1.54 GB2.04 GB
85%
Q8_081.764 GB2.26 GB2.76 GB
98%

Context window & KV cache

Adds 0.17 GB to VRAM

Long chats and RAG inputs cost real memory. Drag to see how 32K vs 128K context shifts your grade.

Model native max: 32K tokens. KV-cache estimate is approximate (±30 %); real usage depends on attention layout.

How to run Qwen 2.5 Coder 1.5B

Pick a runtime — copy & paste. Commands are pre-filled with this model’s repo.

Easiest. Single command. OpenAI-compatible API on :11434.

Ollama home →
  1. 1

    Pull the model

    ollama pull qwen2.5-coder:1.5b
  2. 2

    Chat

    ollama run qwen2.5-coder:1.5b
  3. 3

    Use as API

    curl http://localhost:11434/api/chat \
      -d '{"model":"qwen2.5-coder:1.5b","messages":[{"role":"user","content":"Hi"}]}'

Community benchmarks

Real tokens/sec reports from people running Qwen 2.5 Coder 1.5B on actual hardware.

No community runs yet for this model. Be the first to submit your numbers.

Self-host serving plan

Want to host Qwen 2.5 Coder 1.5Bfor many users? Or run it on a card that’s technically too small? Slide the knobs.

VRAM needed

2.3 GB

1.5 GB weights + 0.3 GB KV

Aggregate tok/s

167

across 1 user

Per-user tok/s

167

1.5 B dense

✅ Fits in 24 GB VRAM with 21.7 GB headroom. Pure-GPU inference — full speed.

Throughput is a sub-linear estimate: doubling users adds ~70 % of single-user TPS until ~8, then plateaus on memory bandwidth. MoE models scale concurrency much better because each user activates a different subset of experts.

See It In Action

Real model outputs generated via RunThisModel.com — watch responses stream in real time.

Llama 3.3 70B responding...

Outputs generated by real AI models via RunThisModel.com. Generation speed shown is from cloud inference. Local speeds vary by hardware — check your device.

faq·common questions
how much VRAM do I need to run Qwen 2.5 Coder 1.5B?

Qwen 2.5 Coder 1.5B requires 1.54 GB VRAM minimum with Q4_K_M quantization. For full precision you need 2.26 GB.

which quant should I pick?

Q4_K_M is the best quality/VRAM balance — ~92% of FP16 quality at ~25% the footprint. Q8_0 is near-lossless if you have the headroom.

faq://ai-curated·20 entries
What GPU do I need to run Qwen 2.5 Coder 1.5B?

To run Qwen 2.5 Coder 1.5B, you need a GPU with at least 1.5 GB to 2.3 GB of VRAM, depending on the quantization level. Recommended GPUs include NVIDIA RTX 2060 or higher.

Is Qwen 2.5 Coder 1.5B good for coding?

Yes, Qwen 2.5 Coder 1.5B is specifically designed for code generation and understanding, making it highly effective for coding tasks.

Qwen 2.5 Coder 1.5B vs Llama 3.1 8B?

Qwen 2.5 Coder 1.5B is smaller (1.5B parameters) and more focused on code, while Llama 3.1 8B is larger and more general-purpose. Qwen 2.5 Coder 1.5B is better suited for coding-specific tasks.

Can I run Qwen 2.5 Coder 1.5B on a Mac?

Yes, you can run Qwen 2.5 Coder 1.5B on a Mac with an M1 or M2 chip, provided you have the necessary VRAM and a compatible environment set up.

How much VRAM does Qwen 2.5 Coder 1.5B need?

Qwen 2.5 Coder 1.5B requires between 1.5 GB and 2.3 GB of VRAM, depending on the quantization level used.

Is Qwen 2.5 Coder 1.5B censored?

Qwen 2.5 Coder 1.5B is not censored, but it adheres to ethical guidelines and may filter out harmful content.

Is Qwen 2.5 Coder 1.5B commercial-use allowed?

Yes, Qwen 2.5 Coder 1.5B is licensed under Apache-2.0, which allows for commercial use.

Qwen 2.5 Coder 1.5B context length?

Qwen 2.5 Coder 1.5B has a context length of 32,768 tokens, allowing for long and complex code sequences.

Does Qwen 2.5 Coder 1.5B support function calling?

Yes, Qwen 2.5 Coder 1.5B supports function calling, enabling it to generate and understand code that includes function calls.

Qwen 2.5 Coder 1.5B quantization options?

Qwen 2.5 Coder 1.5B supports various quantization options, including 8-bit and 4-bit, which reduce VRAM usage and improve inference speed.

Can Qwen 2.5 Coder 1.5B run on CPU?

Yes, Qwen 2.5 Coder 1.5B can run on CPU, but it will be significantly slower compared to running on a GPU.

Qwen 2.5 Coder 1.5B fine-tuning?

Qwen 2.5 Coder 1.5B can be fine-tuned for specific tasks using datasets and training frameworks like Hugging Face Transformers.

Qwen 2.5 Coder 1.5B system requirements?

Qwen 2.5 Coder 1.5B requires at least 1.5 GB to 2.3 GB of VRAM, 8 GB of RAM, and a 64-bit operating system. A GPU with CUDA support is recommended for optimal performance.

Qwen 2.5 Coder 1.5B performance benchmark?

Qwen 2.5 Coder 1.5B can process around 50-100 tokens per second on a mid-range GPU, with performance varying based on the quantization level and hardware.

Qwen 2.5 Coder 1.5B for RAG?

Qwen 2.5 Coder 1.5B can be used for Retrieval-Augmented Generation (RAG) by integrating it with a retrieval system to enhance its context and accuracy.

Qwen 2.5 Coder 1.5B for agents?

Qwen 2.5 Coder 1.5B can be used to power agents that require code generation and understanding, such as chatbots or automated coding assistants.

Qwen 2.5 Coder 1.5B for coding vs general?

Qwen 2.5 Coder 1.5B is optimized for coding tasks, making it more suitable for generating and understanding code compared to general-purpose models.

Qwen 2.5 Coder 1.5B vs ChatGPT?

Qwen 2.5 Coder 1.5B is specifically designed for code, while ChatGPT is a general-purpose language model. Qwen 2.5 Coder 1.5B excels in coding tasks, whereas ChatGPT is better for a wide range of natural language processing tasks.

Qwen 2.5 Coder 1.5B download size?

The download size of Qwen 2.5 Coder 1.5B is approximately 3 GB, depending on the quantization level and format.

Best quant for Qwen 2.5 Coder 1.5B?

The best quantization for Qwen 2.5 Coder 1.5B depends on your hardware. 8-bit quantization offers a good balance between performance and VRAM usage, while 4-bit quantization further reduces VRAM but may slightly impact performance.