~/runthismodel
daemon okbuild 5a3c91d00:00:00Z
./models/browse/moondream2
Moondream · multimodal
Moondream 2
Ultra-compact vision model. Only 1GB. Answers questions about images.
1.8b paramsmoondreamapache-2.02K ctx1.51.5 GB vram
about·model card

Moondream 2 is a 1.8 billion parameter multimodal model designed to convert images into descriptive text, making it an excellent choice for tasks like image captioning, content generation, and even basic visual question answering. The model’s architecture is optimized for efficiency, allowing it to run smoothly on a wide range of hardware, including systems with as little as 1.5 GB of VRAM. This makes it particularly appealing for users who want to deploy a powerful image-to-text model without the need for high-end GPUs.

In its size class, Moondream 2 punches well above its weight. Despite having fewer parameters than some of its competitors, it delivers impressive accuracy and coherence in its outputs. The model’s context length of 2048 tokens ensures that it can generate detailed and contextually rich descriptions, which is a significant advantage for applications requiring nuanced understanding of images. The availability of quantization options, such as Q4_K_M, further enhances its efficiency, making it a practical choice for both desktop and mobile deployments. Users looking for a balance between performance and resource usage, especially those with mid-range hardware, will find Moondream 2 to be a reliable and versatile tool for their projects.

probe://hardware·which quants fit your rig
we auto-detect via WebGL/WebGPU. select manually if your GPU isn't recognized.
./quantizations·1 variants
QuantizationBitsFile SizeVRAM NeededRAM NeededQuality
Q4_K_M4.51 GB1.5 GB2.5 GB
85%

Context window & KV cache

Adds 0.04 GB to VRAM

Long chats and RAG inputs cost real memory. Drag to see how 32K vs 128K context shifts your grade.

Model native max: 2K tokens. KV-cache estimate is approximate (±30 %); real usage depends on attention layout.

How to run Moondream 2

Pick a runtime — copy & paste. Commands are pre-filled with this model’s repo.

Easiest. Single command. OpenAI-compatible API on :11434.

Ollama home →
  1. 1

    Pull the model

    ollama pull moondream
  2. 2

    Chat

    ollama run moondream
  3. 3

    Use as API

    curl http://localhost:11434/api/chat \
      -d '{"model":"moondream","messages":[{"role":"user","content":"Hi"}]}'

Community benchmarks

Real tokens/sec reports from people running Moondream 2 on actual hardware.

No community runs yet for this model. Be the first to submit your numbers.

Self-host serving plan

Want to host Moondream 2for many users? Or run it on a card that’s technically too small? Slide the knobs.

VRAM needed

2.3 GB

1.5 GB weights + 0.3 GB KV

Aggregate tok/s

139

across 1 user

Per-user tok/s

139

1.8 B dense

✅ Fits in 24 GB VRAM with 21.7 GB headroom. Pure-GPU inference — full speed.

Throughput is a sub-linear estimate: doubling users adds ~70 % of single-user TPS until ~8, then plateaus on memory bandwidth. MoE models scale concurrency much better because each user activates a different subset of experts.

faq·common questions
how much VRAM do I need to run Moondream 2?

Moondream 2 requires 1.5 GB VRAM minimum with Q4_K_M quantization. For full precision you need 1.5 GB.

which quant should I pick?

Q4_K_M is the best quality/VRAM balance — ~92% of FP16 quality at ~25% the footprint. Q8_0 is near-lossless if you have the headroom.

faq://ai-curated·20 entries
What GPU do I need to run Moondream 2?

To run Moondream 2, you need a GPU with at least 1.5 GB of VRAM. The model is optimized for low VRAM usage, making it suitable for older or budget GPUs.

Is Moondream 2 good for coding?

Moondream 2 is primarily designed for multimodal tasks, such as answering questions about images. It is not optimized for coding tasks, which typically require specialized language models.

Moondream 2 vs Llama 3.1 8B?

Moondream 2 has 1.8 billion parameters and is optimized for multimodal tasks, while Llama 3.1 8B is a larger language model with 8 billion parameters, better suited for text-only tasks. Moondream 2 requires less VRAM and is more compact.

Can I run Moondream 2 on a Mac?

Yes, Moondream 2 can be run on a Mac with a compatible GPU. Ensure your Mac has at least 1.5 GB of VRAM to handle the model efficiently.

How much VRAM does Moondream 2 need?

Moondream 2 requires 1.5 GB of VRAM, regardless of quantization. This makes it suitable for systems with limited GPU resources.

Is Moondream 2 censored?

Moondream 2 is not inherently censored. However, the model adheres to the Apache-2.0 license, which may include guidelines for responsible use.

Is Moondream 2 commercial-use allowed?

Yes, Moondream 2 is licensed under the Apache-2.0 license, which allows for commercial use without restrictions.

Moondream 2 context length?

Moondream 2 has a context length of 2048 tokens, allowing it to process longer sequences of text and image data.

Does Moondream 2 support function calling?

Moondream 2 does not natively support function calling. It is designed primarily for multimodal tasks and answering questions about images.

Moondream 2 quantization options?

Moondream 2 supports quantization, but the VRAM requirement remains at 1.5 GB regardless of the quantization level. This ensures consistent performance across different systems.

Can Moondream 2 run on CPU?

While Moondream 2 can run on a CPU, it is optimized for GPU usage. Running it on a CPU will significantly slow down performance and may not be practical for real-time applications.

Moondream 2 fine-tuning?

Moondream 2 can be fine-tuned for specific tasks, but this requires additional data and computational resources. Fine-tuning can improve its performance on specific multimodal tasks.

Moondream 2 system requirements?

Moondream 2 requires a GPU with at least 1.5 GB of VRAM, a modern CPU, and sufficient RAM (at least 8 GB). It also needs a compatible operating system and drivers.

Moondream 2 performance benchmark?

Moondream 2 processes around 50 tokens per second on a mid-range GPU. Performance can vary based on the specific hardware and quantization level used.

Moondream 2 for RAG?

Moondream 2 can be used for Retrieval-Augmented Generation (RAG) tasks, especially when combined with a retrieval system to enhance its capabilities in generating contextually relevant responses.

Moondream 2 for agents?

Moondream 2 can be integrated into agents for tasks that involve processing and understanding images, such as visual question answering and image captioning.

Moondream 2 for coding vs general?

Moondream 2 is better suited for general multimodal tasks rather than coding. For coding, consider using specialized language models designed for code generation and understanding.

Moondream 2 vs ChatGPT?

Moondream 2 is a multimodal model focused on image and text interactions, while ChatGPT is a text-only language model. ChatGPT is better for conversational and text-based tasks, whereas Moondream 2 excels in visual question answering.

Moondream 2 download size?

The download size of Moondream 2 is approximately 1 GB, making it a lightweight model that is easy to store and transfer.

Best quant for Moondream 2?

The best quantization for Moondream 2 depends on your specific use case. For most applications, the default quantization level should provide a good balance between performance and resource usage.