Alibaba

Qwen3 30B-A3B

Mixture-of-Experts model with 30 B total parameters but only 3 B active per token. Runs at the speed of a 3 B model, with the knowledge of a 30 B. Sweet spot for 24 GB cards.

30.5B parametersqwen3-moeapache-2.032K context20GB - 36GB VRAM

About This Model

Qwen3 30B-A3B is the model that finally makes MoE practical for consumer hardware. Total memory footprint sits at 20 GB for Q4 — fits on a 24 GB RTX 3090/4090 — but inference speed lands around what you would expect from a 3 B model because only ~3.3 B parameters activate per token. The trade-off: if your VRAM is smaller than 20 GB you cannot run it at all, since all expert weights must be loaded simultaneously.

Check Your Hardware

See which quantizations of Qwen3 30B-A3B your hardware can run.

Quantization Options

QuantizationBitsFile SizeVRAM NeededRAM NeededQuality
Q4_K_M4.518 GB20 GB24 GB
85%
Q8_0832 GB36 GB40 GB
98%

Context window & KV cache

Adds 1.50 GB to VRAM

Long chats and RAG inputs cost real memory. Drag to see how 32K vs 128K context shifts your grade.

Model native max: 32K tokens. KV-cache estimate is approximate (±30 %); real usage depends on attention layout.

How to run Qwen3 30B-A3B

Pick a runtime — copy & paste. Commands are pre-filled with this model’s repo.

GUI. Browse → download → chat. MLX on Apple Silicon.

LM Studio home →
  1. 1

    Open LM Studio

    Go to the 🔍 Search tab.

  2. 2

    Search for

    bartowski/Qwen3-30B-A3B-Instruct-GGUF
  3. 3

    Download

    Pick the Q4_K_M quant — best balance of size vs. quality.

  4. 4

    Chat

    Hit ▶ Load Model and start chatting. Toggle 'Local Server' to expose an OpenAI-compatible API on :1234.

Community benchmarks

Real tokens/sec reports from people running Qwen3 30B-A3B on actual hardware.

No community runs yet for this model. Be the first to submit your numbers.

See It In Action

Real model outputs generated via RunThisModel.com — watch responses stream in real time.

Llama 3.3 70B responding...

Outputs generated by real AI models via RunThisModel.com. Generation speed shown is from cloud inference. Local speeds vary by hardware — check your device.

Frequently Asked Questions

How much VRAM do I need to run Qwen3 30B-A3B?

Qwen3 30B-A3B requires 20GB VRAM minimum with Q4_K_M quantization. For full precision, you need 36GB VRAM.

What is the best quantization for Qwen3 30B-A3B?

Q4_K_M offers the best balance of quality and VRAM usage. Q8_0 is near-lossless if you have enough VRAM.