~/runthismodel
daemon okbuild 5a3c91d00:00:00Z
./models/browse/paligemma-3b
Google · multimodal
PaliGemma 3B
Google's vision model. Strong at visual QA, captioning, and OCR.
3b paramspaligemmagemma0K ctx2.52.5 GB vram
about·model card

PaliGemma 3B is a multimodal image-to-text model developed by Google, designed to generate descriptive text from images. With 3 billion parameters, it strikes a balance between complexity and performance, making it suitable for a wide range of applications such as image captioning, visual question answering, and content generation. The model’s context length of 256 tokens allows it to handle detailed descriptions and complex queries, enhancing its versatility in generating rich, context-aware text.

In its size class, PaliGemma 3B performs efficiently, offering a good balance between computational demands and output quality. It is particularly noteworthy for its ability to produce high-quality captions and descriptions with relatively low VRAM requirements (2.5–2.5 GB), making it accessible for users with mid-range GPUs. While it may not outperform larger models in every scenario, its efficiency and effectiveness make it a strong choice for those who need a robust yet lightweight solution. Ideal users include developers, content creators, and researchers looking for a reliable image-to-text model that can be deployed on a variety of hardware, from laptops to more powerful workstations.

probe://hardware·which quants fit your rig
we auto-detect via WebGL/WebGPU. select manually if your GPU isn't recognized.
./quantizations·1 variants
QuantizationBitsFile SizeVRAM NeededRAM NeededQuality
Q4_K_M4.52 GB2.5 GB4 GB
85%

Context window & KV cache

Adds 0.02 GB to VRAM

Long chats and RAG inputs cost real memory. Drag to see how 32K vs 128K context shifts your grade.

Model native max: 0K tokens. KV-cache estimate is approximate (±30 %); real usage depends on attention layout.

How to run PaliGemma 3B

Pick a runtime — copy & paste. Commands are pre-filled with this model’s repo.

GUI. Browse → download → chat. MLX on Apple Silicon.

LM Studio home →
  1. 1

    Open LM Studio

    Go to the 🔍 Search tab.

  2. 2

    Search for

    abetlen/paligemma-3b-mix-224-gguf
  3. 3

    Download

    Pick the Q4_K_M quant — best balance of size vs. quality.

  4. 4

    Chat

    Hit ▶ Load Model and start chatting. Toggle 'Local Server' to expose an OpenAI-compatible API on :1234.

Community benchmarks

Real tokens/sec reports from people running PaliGemma 3B on actual hardware.

No community runs yet for this model. Be the first to submit your numbers.

Self-host serving plan

Want to host PaliGemma 3Bfor many users? Or run it on a card that’s technically too small? Slide the knobs.

VRAM needed

3.4 GB

2.5 GB weights + 0.4 GB KV

Aggregate tok/s

83

across 1 user

Per-user tok/s

83

3 B dense

✅ Fits in 24 GB VRAM with 20.6 GB headroom. Pure-GPU inference — full speed.

Throughput is a sub-linear estimate: doubling users adds ~70 % of single-user TPS until ~8, then plateaus on memory bandwidth. MoE models scale concurrency much better because each user activates a different subset of experts.

faq·common questions
how much VRAM do I need to run PaliGemma 3B?

PaliGemma 3B requires 2.5 GB VRAM minimum with Q4_K_M quantization. For full precision you need 2.5 GB.

which quant should I pick?

Q4_K_M is the best quality/VRAM balance — ~92% of FP16 quality at ~25% the footprint. Q8_0 is near-lossless if you have the headroom.

faq://ai-curated·20 entries
What GPU do I need to run PaliGemma 3B?

To run PaliGemma 3B, you need a GPU with at least 2.5 GB of VRAM. Higher VRAM will improve performance and allow for more complex tasks.

Is PaliGemma 3B good for coding?

PaliGemma 3B is primarily designed for visual tasks like image recognition and captioning. It may not be as effective for coding tasks compared to text-focused models.

PaliGemma 3B vs Llama 3.1 8B?

PaliGemma 3B has 3 billion parameters and excels in visual tasks, while Llama 3.1 8B has 8 billion parameters and is better suited for text generation and language understanding.

Can I run PaliGemma 3B on a Mac?

Yes, you can run PaliGemma 3B on a Mac, but ensure your Mac has a compatible GPU with at least 2.5 GB of VRAM for optimal performance.

How much VRAM does PaliGemma 3B need?

PaliGemma 3B requires at least 2.5 GB of VRAM, but more VRAM can enhance performance and support larger batch sizes.

Is PaliGemma 3B censored?

PaliGemma 3B is not inherently censored, but its outputs are guided by the training data and can be filtered or moderated based on the application.

Is PaliGemma 3B commercial-use allowed?

PaliGemma 3B is licensed under the Gemma license, which allows for commercial use as long as you comply with the terms of the license.

PaliGemma 3B context length?

The context length for PaliGemma 3B is 256 tokens, which is suitable for most visual and text tasks.

Does PaliGemma 3B support function calling?

PaliGemma 3B does not natively support function calling, but you can integrate it with external functions using custom scripts or APIs.

PaliGemma 3B quantization options?

PaliGemma 3B supports various quantization options, including 8-bit and 4-bit, which can reduce VRAM usage and improve inference speed.

Can PaliGemma 3B run on CPU?

PaliGemma 3B can run on a CPU, but performance will be significantly slower compared to running on a GPU with at least 2.5 GB of VRAM.

PaliGemma 3B fine-tuning?

PaliGemma 3B can be fine-tuned on specific datasets to improve performance on particular tasks, such as visual question answering or image captioning.

PaliGemma 3B system requirements?

To run PaliGemma 3B, you need a system with at least 8 GB of RAM, a GPU with 2.5 GB of VRAM, and a 64-bit operating system.

PaliGemma 3B performance benchmark?

PaliGemma 3B processes approximately 10-20 tokens per second on a mid-range GPU, with higher-end GPUs achieving up to 30-40 tokens per second.

PaliGemma 3B for RAG?

PaliGemma 3B can be used for Retrieval-Augmented Generation (RAG) tasks, particularly for visual and multimodal content retrieval and generation.

PaliGemma 3B for agents?

PaliGemma 3B can be integrated into agent systems to enhance their visual and textual capabilities, making them more versatile in interactive environments.

PaliGemma 3B for coding vs general?

PaliGemma 3B is better suited for general visual and multimodal tasks rather than coding-specific tasks, which require specialized text models.

PaliGemma 3B vs ChatGPT?

PaliGemma 3B is a multimodal model focused on visual tasks, while ChatGPT is a text-based model designed for conversational and language tasks.

PaliGemma 3B download size?

The download size for PaliGemma 3B is approximately 6 GB, depending on the quantization level and additional resources.

Best quant for PaliGemma 3B?

The best quantization for PaliGemma 3B depends on your use case. 8-bit quantization offers a good balance between performance and VRAM efficiency, while 4-bit quantization further reduces VRAM usage at the cost of some accuracy.