~/runthismodel
daemon okbuild 5a3c91d00:00:00Z
./models/browse/deepseek-coder-6.7b-instruct
DeepSeek · code
DeepSeek Coder 6.7B
Powerful 6.7B code model with excellent code generation across many languages.
6.7b paramsllamamit16K ctx4.37.17 GB vram
about·model card

DeepSeek Coder 6.7B is a robust code generation model based on the LLaMA architecture, designed to assist developers with a wide range of programming tasks. With 6.7 billion parameters, this model excels in generating high-quality, contextually relevant code snippets, completing functions, and even suggesting entire blocks of code. Its impressive context length of 16,384 tokens allows it to maintain a deep understanding of complex codebases, making it particularly useful for large-scale projects and intricate coding challenges. The model is licensed under the MIT license, ensuring flexibility and ease of integration into various development workflows.

In its size class, DeepSeek Coder 6.7B stands out for its balance between performance and efficiency. While it may not match the cutting-edge capabilities of larger models, it offers a compelling combination of speed and accuracy that makes it a practical choice for many developers. The available quantizations, Q4_K_M and Q8_0, further enhance its efficiency, allowing it to run smoothly on a variety of hardware setups. Users with 4.3 to 7.2 GB of VRAM can comfortably deploy this model, making it accessible to those with mid-range GPUs. This model is ideal for developers looking to streamline their coding process without the need for high-end hardware, providing a powerful tool for both individual programmers and small teams working on diverse coding projects.

probe://hardware·which quants fit your rig
we auto-detect via WebGL/WebGPU. select manually if your GPU isn't recognized.
./quantizations·2 variants
QuantizationBitsFile SizeVRAM NeededRAM NeededQuality
Q4_K_M4.53.803 GB4.3 GB4.8 GB
85%
Q8_086.672 GB7.17 GB7.67 GB
98%

Context window & KV cache

Adds 1.00 GB to VRAM

Long chats and RAG inputs cost real memory. Drag to see how 32K vs 128K context shifts your grade.

Model native max: 16K tokens. KV-cache estimate is approximate (±30 %); real usage depends on attention layout.

How to run DeepSeek Coder 6.7B

Pick a runtime — copy & paste. Commands are pre-filled with this model’s repo.

Easiest. Single command. OpenAI-compatible API on :11434.

Ollama home →
  1. 1

    Pull the model

    ollama pull deepseek-coder:6.7b
  2. 2

    Chat

    ollama run deepseek-coder:6.7b
  3. 3

    Use as API

    curl http://localhost:11434/api/chat \
      -d '{"model":"deepseek-coder:6.7b","messages":[{"role":"user","content":"Hi"}]}'

Community benchmarks

Real tokens/sec reports from people running DeepSeek Coder 6.7B on actual hardware.

No community runs yet for this model. Be the first to submit your numbers.

Self-host serving plan

Want to host DeepSeek Coder 6.7Bfor many users? Or run it on a card that’s technically too small? Slide the knobs.

VRAM needed

5.4 GB

4.3 GB weights + 0.6 GB KV

Aggregate tok/s

37

across 1 user

Per-user tok/s

37

6.7 B dense

✅ Fits in 24 GB VRAM with 18.6 GB headroom. Pure-GPU inference — full speed.

Throughput is a sub-linear estimate: doubling users adds ~70 % of single-user TPS until ~8, then plateaus on memory bandwidth. MoE models scale concurrency much better because each user activates a different subset of experts.

See It In Action

Real model outputs generated via RunThisModel.com — watch responses stream in real time.

Llama 3.3 70B responding...

Outputs generated by real AI models via RunThisModel.com. Generation speed shown is from cloud inference. Local speeds vary by hardware — check your device.

faq·common questions
how much VRAM do I need to run DeepSeek Coder 6.7B?

DeepSeek Coder 6.7B requires 4.3 GB VRAM minimum with Q4_K_M quantization. For full precision you need 7.17 GB.

which quant should I pick?

Q4_K_M is the best quality/VRAM balance — ~92% of FP16 quality at ~25% the footprint. Q8_0 is near-lossless if you have the headroom.

faq://ai-curated·20 entries
What GPU do I need to run DeepSeek Coder 6.7B?

To run DeepSeek Coder 6.7B, you need a GPU with at least 4.3 GB of VRAM for the lowest quantization level, up to 7.2 GB for higher precision. NVIDIA GPUs like the RTX 3060 or better are recommended.

Is DeepSeek Coder 6.7B good for coding?

Yes, DeepSeek Coder 6.7B is specifically designed for code generation and performs well across multiple programming languages, making it an excellent choice for coding tasks.

DeepSeek Coder 6.7B vs Llama 3.1 8B?

DeepSeek Coder 6.7B is optimized for code generation and has a smaller model size (6.7B vs 8B), which may result in faster inference times and lower VRAM requirements compared to Llama 3.1 8B.

Can I run DeepSeek Coder 6.7B on a Mac?

Yes, you can run DeepSeek Coder 6.7B on a Mac with an M1 or M2 chip, but performance will be better on a Mac with a dedicated NVIDIA GPU.

How much VRAM does DeepSeek Coder 6.7B need?

DeepSeek Coder 6.7B requires between 4.3 GB and 7.2 GB of VRAM, depending on the quantization level used.

Is DeepSeek Coder 6.7B censored?

No, DeepSeek Coder 6.7B is not censored. It generates code based on the input provided and is not restricted by content filters.

Is DeepSeek Coder 6.7B commercial-use allowed?

Yes, DeepSeek Coder 6.7B is licensed under the MIT License, which allows for both personal and commercial use.

DeepSeek Coder 6.7B context length?

DeepSeek Coder 6.7B has a context length of 16,384 tokens, allowing it to handle longer sequences of code.

Does DeepSeek Coder 6.7B support function calling?

Yes, DeepSeek Coder 6.7B supports function calling, enabling it to generate and execute complex code snippets.

DeepSeek Coder 6.7B quantization options?

DeepSeek Coder 6.7B supports various quantization levels, including 4-bit, 8-bit, and 16-bit, to optimize performance and VRAM usage.

Can DeepSeek Coder 6.7B run on CPU?

Yes, DeepSeek Coder 6.7B can run on a CPU, but it will be significantly slower compared to running on a GPU.

DeepSeek Coder 6.7B fine-tuning?

DeepSeek Coder 6.7B can be fine-tuned on your own data to improve performance on specific tasks or domains.

DeepSeek Coder 6.7B system requirements?

To run DeepSeek Coder 6.7B, you need a system with at least 16 GB of RAM, a modern CPU, and a GPU with 4.3 GB to 7.2 GB of VRAM, depending on the quantization level.

DeepSeek Coder 6.7B performance benchmark?

DeepSeek Coder 6.7B can process around 50-100 tokens per second on a mid-range GPU like the RTX 3060, with higher performance on more powerful GPUs.

DeepSeek Coder 6.7B for RAG?

DeepSeek Coder 6.7B can be used for Retrieval-Augmented Generation (RAG) to enhance code generation by incorporating external information.

DeepSeek Coder 6.7B for agents?

DeepSeek Coder 6.7B can be integrated into agent systems to provide code generation capabilities, enhancing the agent's ability to perform coding tasks.

DeepSeek Coder 6.7B for coding vs general?

DeepSeek Coder 6.7B is specialized for coding tasks and performs better in generating code compared to general-purpose models, which may excel in a broader range of natural language tasks.

DeepSeek Coder 6.7B vs ChatGPT?

DeepSeek Coder 6.7B is optimized for code generation, while ChatGPT is a general-purpose language model. DeepSeek Coder 6.7B is more suitable for coding tasks, whereas ChatGPT excels in conversational and general text generation.

DeepSeek Coder 6.7B download size?

The download size of DeepSeek Coder 6.7B varies depending on the quantization level, ranging from approximately 2.5 GB for 4-bit quantization to 13.4 GB for full precision.

Best quant for DeepSeek Coder 6.7B?

The best quantization level for DeepSeek Coder 6.7B depends on your hardware and performance needs. 8-bit quantization offers a good balance between speed and accuracy, while 4-bit is ideal for systems with limited VRAM.