~/runthismodel
daemon okbuild 5a3c91d00:00:00Z
./models/browse/deepseek-coder-1.3b-instruct
DeepSeek · code
DeepSeek Coder 1.3B
Compact code model with strong coding capabilities. Great for mobile coding assistants.
1.3b paramsllamamit16K ctx1.311.83 GB vram
about·model card

DeepSeek Coder 1.3B is a code generation model built on the LLaMA architecture, designed to assist developers and enthusiasts in generating high-quality code snippets and documentation. With 1.3 billion parameters, this model offers a robust context length of 16,384 tokens, making it particularly adept at understanding and generating complex code structures and long sequences. The model is licensed under the MIT license, which makes it accessible for both personal and commercial projects. It has gained significant traction, with over 72,000 downloads and 160 likes, indicating its popularity and utility in the developer community.

Despite its relatively modest size, DeepSeek Coder 1.3B punches well above its weight in the 1.3 billion parameter class. It offers a good balance between performance and efficiency, making it a strong contender against larger models that may require more computational resources. The model supports quantizations Q4_K_M and Q8_0, which further enhance its efficiency, allowing it to run on hardware with as little as 1.3 GB of VRAM. This makes it an ideal choice for developers working on lower-end machines or those who prefer to run models locally without the need for powerful GPUs. Given its capabilities and efficiency, DeepSeek Coder 1.3B is particularly suitable for software developers, data scientists, and hobbyists who need a reliable code generation tool that can run efficiently on a wide range of hardware.

probe://hardware·which quants fit your rig
we auto-detect via WebGL/WebGPU. select manually if your GPU isn't recognized.
./quantizations·2 variants
QuantizationBitsFile SizeVRAM NeededRAM NeededQuality
Q4_K_M4.50.814 GB1.31 GB1.81 GB
85%
Q8_081.334 GB1.83 GB2.33 GB
98%

Context window & KV cache

Adds 0.17 GB to VRAM

Long chats and RAG inputs cost real memory. Drag to see how 32K vs 128K context shifts your grade.

Model native max: 16K tokens. KV-cache estimate is approximate (±30 %); real usage depends on attention layout.

How to run DeepSeek Coder 1.3B

Pick a runtime — copy & paste. Commands are pre-filled with this model’s repo.

Easiest. Single command. OpenAI-compatible API on :11434.

Ollama home →
  1. 1

    Pull the model

    ollama pull deepseek-coder:1.3b
  2. 2

    Chat

    ollama run deepseek-coder:1.3b
  3. 3

    Use as API

    curl http://localhost:11434/api/chat \
      -d '{"model":"deepseek-coder:1.3b","messages":[{"role":"user","content":"Hi"}]}'

Community benchmarks

Real tokens/sec reports from people running DeepSeek Coder 1.3B on actual hardware.

No community runs yet for this model. Be the first to submit your numbers.

Self-host serving plan

Want to host DeepSeek Coder 1.3Bfor many users? Or run it on a card that’s technically too small? Slide the knobs.

VRAM needed

2.1 GB

1.3 GB weights + 0.3 GB KV

Aggregate tok/s

192

across 1 user

Per-user tok/s

192

1.3 B dense

✅ Fits in 24 GB VRAM with 21.9 GB headroom. Pure-GPU inference — full speed.

Throughput is a sub-linear estimate: doubling users adds ~70 % of single-user TPS until ~8, then plateaus on memory bandwidth. MoE models scale concurrency much better because each user activates a different subset of experts.

See It In Action

Real model outputs generated via RunThisModel.com — watch responses stream in real time.

Llama 3.3 70B responding...

Outputs generated by real AI models via RunThisModel.com. Generation speed shown is from cloud inference. Local speeds vary by hardware — check your device.

faq·common questions
how much VRAM do I need to run DeepSeek Coder 1.3B?

DeepSeek Coder 1.3B requires 1.31 GB VRAM minimum with Q4_K_M quantization. For full precision you need 1.83 GB.

which quant should I pick?

Q4_K_M is the best quality/VRAM balance — ~92% of FP16 quality at ~25% the footprint. Q8_0 is near-lossless if you have the headroom.

faq://ai-curated·20 entries
What GPU do I need to run DeepSeek Coder 1.3B?

To run DeepSeek Coder 1.3B, you need a GPU with at least 1.3 GB of VRAM, though 1.8 GB is recommended for better performance, especially with higher quantization levels.

Is DeepSeek Coder 1.3B good for coding?

Yes, DeepSeek Coder 1.3B is specifically designed for coding tasks and excels in providing accurate and context-aware code suggestions, making it a great choice for developers.

DeepSeek Coder 1.3B vs Llama 3.1 8B?

DeepSeek Coder 1.3B is smaller and more efficient, requiring less VRAM and computational power compared to Llama 3.1 8B, which has 8 billion parameters and is more versatile but resource-intensive.

Can I run DeepSeek Coder 1.3B on a Mac?

Yes, you can run DeepSeek Coder 1.3B on a Mac as long as your system meets the minimum VRAM requirements and you have the necessary software environment set up.

How much VRAM does DeepSeek Coder 1.3B need?

DeepSeek Coder 1.3B requires between 1.3 GB and 1.8 GB of VRAM, depending on the quantization level used. Higher quantization levels generally require more VRAM.

Is DeepSeek Coder 1.3B censored?

No, DeepSeek Coder 1.3B is not censored. It is designed to provide open and unrestricted code generation, but it adheres to ethical guidelines to prevent harmful content.

Is DeepSeek Coder 1.3B commercial-use allowed?

Yes, DeepSeek Coder 1.3B is licensed under the MIT License, which allows for both personal and commercial use without restrictions.

DeepSeek Coder 1.3B context length?

DeepSeek Coder 1.3B has a context length of 16,384 tokens, allowing it to handle large and complex code snippets effectively.

Does DeepSeek Coder 1.3B support function calling?

Yes, DeepSeek Coder 1.3B supports function calling, enabling it to generate and execute code dynamically, which is useful for interactive coding environments.

DeepSeek Coder 1.3B quantization options?

DeepSeek Coder 1.3B supports various quantization options, including 4-bit, 8-bit, and 16-bit, to optimize performance and reduce memory usage.

Can DeepSeek Coder 1.3B run on CPU?

Yes, DeepSeek Coder 1.3B can run on CPU, but it will be significantly slower compared to running on a GPU. For optimal performance, a GPU is recommended.

DeepSeek Coder 1.3B fine-tuning?

DeepSeek Coder 1.3B can be fine-tuned on custom datasets to improve its performance on specific coding tasks or domains, using frameworks like Hugging Face Transformers.

DeepSeek Coder 1.3B system requirements?

To run DeepSeek Coder 1.3B, you need a system with at least 1.3 GB of VRAM, 8 GB of RAM, and a modern CPU. A GPU with 1.8 GB of VRAM is recommended for better performance.

DeepSeek Coder 1.3B performance benchmark?

DeepSeek Coder 1.3B processes approximately 50-100 tokens per second on a mid-range GPU, with performance varying based on the specific hardware and quantization level used.

DeepSeek Coder 1.3B for RAG?

DeepSeek Coder 1.3B can be used for Retrieval-Augmented Generation (RAG) to enhance code suggestions by incorporating external data sources, improving its contextual accuracy and relevance.

DeepSeek Coder 1.3B for agents?

Yes, DeepSeek Coder 1.3B can be integrated into coding agents to provide real-time code assistance, error detection, and automated code generation in development environments.

DeepSeek Coder 1.3B for coding vs general?

DeepSeek Coder 1.3B is optimized for coding tasks and may outperform general-purpose models in generating accurate and context-aware code, but it is less versatile for non-coding tasks.

DeepSeek Coder 1.3B vs ChatGPT?

DeepSeek Coder 1.3B is specialized for coding tasks and is more efficient in terms of resource usage, while ChatGPT is a general-purpose model with broader capabilities but higher resource requirements.

DeepSeek Coder 1.3B download size?

The download size of DeepSeek Coder 1.3B varies depending on the quantization level, ranging from approximately 1.5 GB to 2.5 GB.

Best quant for DeepSeek Coder 1.3B?

The best quantization level for DeepSeek Coder 1.3B depends on your hardware. For most users, 8-bit quantization provides a good balance between performance and memory efficiency.