~/runthismodel
daemon okbuild 5a3c91d00:00:00Z
./models/browse/yi-coder-9b
01.AI · code
Yi Coder 9B
Strong 9B code model with good reasoning.
9b paramsyiapache-2.04K ctx5.469.24 GB vram
about·model card

Yi Coder 9B, authored by 01.AI, is a 9 billion parameter text generation model specifically tailored for coding tasks. It excels in generating high-quality code snippets, completing code functions, and offering suggestions for debugging and optimization. With a context length of 4096 tokens, it can handle relatively complex coding scenarios and maintain coherence over longer sequences, making it a valuable tool for developers and software engineers. The model is licensed under Apache-2.0, ensuring it is freely usable for both personal and commercial projects.

In its size class, Yi Coder 9B holds its own, offering a good balance between performance and efficiency. While it may not outperform the largest models in the market, it provides a notable level of accuracy and context understanding that is often sufficient for most coding tasks. The model's quantization options (Q4_K_M, Q8_0) and VRAM range of 5.5–9.2 GB make it accessible for a wide range of hardware setups, from mid-range GPUs to more powerful systems. This flexibility means that developers with varying hardware capabilities can leverage its benefits without significant resource constraints. Ideal users include software developers, coding enthusiasts, and small teams looking to enhance their productivity with a reliable local AI coding assistant.

probe://hardware·which quants fit your rig
we auto-detect via WebGL/WebGPU. select manually if your GPU isn't recognized.
./quantizations·2 variants
QuantizationBitsFile SizeVRAM NeededRAM NeededQuality
Q4_K_M4.54.963 GB5.46 GB5.96 GB
85%
Q8_088.739 GB9.24 GB9.74 GB
98%

Context window & KV cache

Adds 0.50 GB to VRAM

Long chats and RAG inputs cost real memory. Drag to see how 32K vs 128K context shifts your grade.

Model native max: 4K tokens. KV-cache estimate is approximate (±30 %); real usage depends on attention layout.

How to run Yi Coder 9B

Pick a runtime — copy & paste. Commands are pre-filled with this model’s repo.

Easiest. Single command. OpenAI-compatible API on :11434.

Ollama home →
  1. 1

    Pull the model

    ollama pull yi-coder:9b
  2. 2

    Chat

    ollama run yi-coder:9b
  3. 3

    Use as API

    curl http://localhost:11434/api/chat \
      -d '{"model":"yi-coder:9b","messages":[{"role":"user","content":"Hi"}]}'

Community benchmarks

Real tokens/sec reports from people running Yi Coder 9B on actual hardware.

No community runs yet for this model. Be the first to submit your numbers.

Self-host serving plan

Want to host Yi Coder 9Bfor many users? Or run it on a card that’s technically too small? Slide the knobs.

VRAM needed

6.7 GB

5.5 GB weights + 0.8 GB KV

Aggregate tok/s

28

across 1 user

Per-user tok/s

28

9 B dense

✅ Fits in 24 GB VRAM with 17.3 GB headroom. Pure-GPU inference — full speed.

Throughput is a sub-linear estimate: doubling users adds ~70 % of single-user TPS until ~8, then plateaus on memory bandwidth. MoE models scale concurrency much better because each user activates a different subset of experts.

See It In Action

Real model outputs generated via RunThisModel.com — watch responses stream in real time.

Llama 3.3 70B responding...

Outputs generated by real AI models via RunThisModel.com. Generation speed shown is from cloud inference. Local speeds vary by hardware — check your device.

faq·common questions
how much VRAM do I need to run Yi Coder 9B?

Yi Coder 9B requires 5.46 GB VRAM minimum with Q4_K_M quantization. For full precision you need 9.24 GB.

which quant should I pick?

Q4_K_M is the best quality/VRAM balance — ~92% of FP16 quality at ~25% the footprint. Q8_0 is near-lossless if you have the headroom.

faq://ai-curated·20 entries
What GPU do I need to run Yi Coder 9B?

To run Yi Coder 9B, you need a GPU with at least 5.5 GB of VRAM, but 9.2 GB is recommended for better performance and to handle larger contexts or higher precision.

Is Yi Coder 9B good for coding?

Yes, Yi Coder 9B is specifically designed for coding tasks and excels in code generation, debugging, and reasoning, making it a strong choice for developers.

Yi Coder 9B vs Llama 3.1 8B?

Yi Coder 9B has more parameters (9B vs 8B) and is optimized for coding tasks, while Llama 3.1 8B is a general-purpose model. Yi Coder 9B may perform better in specialized coding scenarios.

Can I run Yi Coder 9B on a Mac?

Yes, you can run Yi Coder 9B on a Mac with an M1 or M2 chip, provided you have the necessary VRAM and system resources. Ensure your macOS version supports the required libraries.

How much VRAM does Yi Coder 9B need?

Yi Coder 9B requires between 5.5 GB and 9.2 GB of VRAM, depending on the quantization level used. Higher quantization levels reduce VRAM usage but may slightly impact performance.

Is Yi Coder 9B censored?

No, Yi Coder 9B is not censored. It is designed to provide accurate and useful responses without restrictions on content, though it adheres to ethical guidelines.

Is Yi Coder 9B commercial-use allowed?

Yes, Yi Coder 9B is licensed under the Apache-2.0 license, which allows for commercial use as long as you comply with the terms of the license.

Yi Coder 9B context length?

Yi Coder 9B has a context length of 4096 tokens, allowing it to handle longer sequences of code and context effectively.

Does Yi Coder 9B support function calling?

Yes, Yi Coder 9B supports function calling, enabling it to interact with external systems and APIs for enhanced functionality.

Yi Coder 9B quantization options?

Yi Coder 9B supports various quantization options, including 4-bit, 8-bit, and 16-bit, which can reduce VRAM usage and improve inference speed.

Can Yi Coder 9B run on CPU?

Yes, Yi Coder 9B can run on a CPU, but performance will be significantly slower compared to running on a GPU. It is recommended to use a GPU for optimal performance.

Yi Coder 9B fine-tuning?

Yi Coder 9B can be fine-tuned on custom datasets to improve its performance on specific coding tasks or domains. Fine-tuning requires a dataset and a training environment.

Yi Coder 9B system requirements?

To run Yi Coder 9B, you need a system with at least 16 GB of RAM, a GPU with 5.5 GB to 9.2 GB of VRAM, and a modern CPU. Additional storage space is required for model files and data.

Yi Coder 9B performance benchmark?

Yi Coder 9B can process around 100-150 tokens per second on a high-end GPU, with performance varying based on the specific hardware and quantization level used.

Yi Coder 9B for RAG?

Yes, Yi Coder 9B can be used for Retrieval-Augmented Generation (RAG) to enhance its capabilities by integrating external knowledge sources.

Yi Coder 9B for agents?

Yi Coder 9B can be used to power coding agents or chatbots, providing them with advanced code generation and reasoning abilities.

Yi Coder 9B for coding vs general?

Yi Coder 9B is optimized for coding tasks and performs best in this domain. While it can handle general text, its strength lies in generating and understanding code.

Yi Coder 9B vs ChatGPT?

Yi Coder 9B is specifically designed for coding tasks and has a smaller model size (9B vs ChatGPT's larger variants), making it more efficient for local deployment. ChatGPT, however, is a more general-purpose model.

Yi Coder 9B download size?

The download size of Yi Coder 9B varies depending on the quantization level. The full model is approximately 18 GB, but quantized versions can be as small as 9 GB.

Best quant for Yi Coder 9B?

The best quantization level for Yi Coder 9B depends on your hardware and performance needs. 8-bit quantization is a good balance between VRAM efficiency and performance, while 4-bit is the most memory-efficient.