~/runthismodel
daemon okbuild 5a3c91d00:00:00Z
./models/browse/gemma-3-27b-it
Google · llm
Gemma 3 27B
Google's flagship open model. Near GPT-4 quality. Needs 20GB+ RAM.
27b paramsgemma3gemma32K ctx15.9115.91 GB vram
about·model card

Gemma 3 27B is a large language model developed by Google, boasting 27 billion parameters and a context length of 32,768 tokens. This model excels in generating high-quality text across a variety of tasks, including but not limited to, writing, summarization, and conversation. Its expansive context window allows it to maintain coherence over longer passages, making it particularly suitable for applications that require deep understanding and long-term memory, such as creating detailed reports, articles, or engaging in complex dialogues.

In its size class, Gemma 3 27B holds its own, offering a balance between performance and efficiency. While it may not outperform the largest models in terms of raw capability, it provides a significant step up from smaller models without requiring excessive computational resources. The model is quantized to Q4_K_M, which helps in reducing the memory footprint and improving inference speed, making it more accessible for local deployment. Users with mid-range GPUs, specifically those with around 16GB of VRAM, can realistically run this model without major bottlenecks. It is ideal for developers, content creators, and researchers who need a powerful yet manageable LLM for local use, ensuring that they can leverage advanced text generation capabilities without the need for cloud services.

probe://hardware·which quants fit your rig
we auto-detect via WebGL/WebGPU. select manually if your GPU isn't recognized.
./quantizations·1 variants
QuantizationBitsFile SizeVRAM NeededRAM NeededQuality
Q4_K_M4.515.41 GB15.91 GB16.41 GB
85%

Context window & KV cache

Adds 1.50 GB to VRAM

Long chats and RAG inputs cost real memory. Drag to see how 32K vs 128K context shifts your grade.

Model native max: 32K tokens. KV-cache estimate is approximate (±30 %); real usage depends on attention layout.

How to run Gemma 3 27B

Pick a runtime — copy & paste. Commands are pre-filled with this model’s repo.

Easiest. Single command. OpenAI-compatible API on :11434.

Ollama home →
  1. 1

    Pull the model

    ollama pull gemma3:27b
  2. 2

    Chat

    ollama run gemma3:27b
  3. 3

    Use as API

    curl http://localhost:11434/api/chat \
      -d '{"model":"gemma3:27b","messages":[{"role":"user","content":"Hi"}]}'

Community benchmarks

Real tokens/sec reports from people running Gemma 3 27B on actual hardware.

No community runs yet for this model. Be the first to submit your numbers.

Self-host serving plan

Want to host Gemma 3 27Bfor many users? Or run it on a card that’s technically too small? Slide the knobs.

VRAM needed

17.7 GB

15.9 GB weights + 1.3 GB KV

Aggregate tok/s

9

across 1 user

Per-user tok/s

9

27 B dense

✅ Fits in 24 GB VRAM with 6.3 GB headroom. Pure-GPU inference — full speed.

Throughput is a sub-linear estimate: doubling users adds ~70 % of single-user TPS until ~8, then plateaus on memory bandwidth. MoE models scale concurrency much better because each user activates a different subset of experts.

bench://measured·hf-inference · 4/28/2026real numbers
streaming-inference measurement, not an estimate.
16.0t/s
sustained throughput
3568ms
time to first token
128tok
generated in 8.0s
16t/s
end-to-end

See It In Action

Real model outputs generated via RunThisModel.com — watch responses stream in real time.

Llama 3.3 70B responding...

Outputs generated by real AI models via RunThisModel.com. Generation speed shown is from cloud inference. Local speeds vary by hardware — check your device.

faq·common questions
how much VRAM do I need to run Gemma 3 27B?

Gemma 3 27B requires 15.91 GB VRAM minimum with Q4_K_M quantization. For full precision you need 15.91 GB.

which quant should I pick?

Q4_K_M is the best quality/VRAM balance — ~92% of FP16 quality at ~25% the footprint. Q8_0 is near-lossless if you have the headroom.

faq://ai-curated·20 entries
What GPU do I need to run Gemma 3 27B?

To run Gemma 3 27B, you need a GPU with at least 15.9 GB of VRAM, such as an NVIDIA RTX 3090 or better.

Is Gemma 3 27B good for coding?

Gemma 3 27B is highly capable for coding tasks, offering near GPT-4 quality in code generation and understanding complex programming concepts.

Gemma 3 27B vs Llama 3.1 8B?

Gemma 3 27B has more parameters (27B vs 8B) and generally performs better in complex tasks, but requires significantly more VRAM and computational resources.

Can I run Gemma 3 27B on a Mac?

Yes, you can run Gemma 3 27B on a Mac, but you will need a Mac with an M1 Ultra or higher to meet the VRAM requirements.

How much VRAM does Gemma 3 27B need?

Gemma 3 27B requires at least 15.9 GB of VRAM, which can vary slightly depending on the quantization level used.

Is Gemma 3 27B censored?

Gemma 3 27B is not inherently censored, but its responses can be filtered or moderated based on the implementation and configuration settings.

Is Gemma 3 27B commercial-use allowed?

Gemma 3 27B is licensed under the 'gemma' license, which allows for commercial use, provided you comply with the terms of the license.

Gemma 3 27B context length?

Gemma 3 27B supports a context length of up to 32,768 tokens, allowing for extensive and detailed conversations.

Does Gemma 3 27B support function calling?

Yes, Gemma 3 27B supports function calling, enabling it to interact with external systems and APIs effectively.

Gemma 3 27B quantization options?

Gemma 3 27B can be quantized to various levels, including 4-bit and 8-bit, to reduce VRAM usage while maintaining performance.

Can Gemma 3 27B run on CPU?

While Gemma 3 27B can technically run on a CPU, it is highly inefficient and slow due to the model's large size and computational demands.

Gemma 3 27B fine-tuning?

Gemma 3 27B can be fine-tuned for specific tasks, but this process requires significant computational resources and expertise.

Gemma 3 27B system requirements?

Gemma 3 27B requires at least 15.9 GB of VRAM, 20 GB of RAM, and a powerful CPU to run efficiently.

Gemma 3 27B performance benchmark?

Gemma 3 27B can process around 100 tokens per second on a high-end GPU like the RTX 3090, but this can vary based on the specific hardware and quantization level.

Gemma 3 27B for RAG?

Gemma 3 27B is well-suited for Retrieval-Augmented Generation (RAG) tasks, thanks to its large context window and ability to handle complex queries.

Gemma 3 27B for agents?

Gemma 3 27B can be used to power conversational agents and chatbots, providing high-quality and contextually rich responses.

Gemma 3 27B for coding vs general?

Gemma 3 27B excels in both coding and general tasks, but its performance in coding is particularly strong due to its ability to understand and generate complex code snippets.

Gemma 3 27B vs ChatGPT?

Gemma 3 27B offers near GPT-4 quality and is more customizable, but ChatGPT may have a more polished user interface and broader community support.

Gemma 3 27B download size?

The download size for Gemma 3 27B varies depending on the quantization level, but it typically ranges from 10 GB to 20 GB.

Best quant for Gemma 3 27B?

The best quantization for Gemma 3 27B depends on your hardware, but 8-bit quantization is often a good balance between performance and VRAM efficiency.