Can M4 Max run Qwen3 235B-A22B?

Yes — runs locally

~12 tok/sec · Usable — noticeable wait (2-5 sec), then steady output.

Your VRAM

128 GB

Model size

235B

Best quant

Q4_K_M

VRAM needed

144.0 GB

The verdict

The M4 Max (128 GB VRAM) handles Qwen3 235B-A22B comfortably using the Q4_K_M quantization, which fits in 144.0 GB. Expected throughput is around 12 tokens/second, which feels Usable — noticeable wait (2-5 sec), then steady output. in interactive use. Flagship MoE — 235 B total parameters, 22 B active. Frontier quality but needs 80 GB+ VRAM to run.

How to run it

1. Install Ollama or LM Studio.
2. Pull the Q4_K_M GGUF — best balance of quality and speed on 128 GB.
3. Start chatting. Expect ~12 tok/sec on first-token, faster after warmup.

See full Qwen3 235B-A22B setup →

Other models that run great on M4 Max

FAQ (20)

What GPU do I need to run Qwen3 235B-A22B?

To run Qwen3 235B-A22B, you need a GPU with at least 144 GB of VRAM, such as multiple NVIDIA A100 or H100 GPUs in a multi-GPU setup.

Is Qwen3 235B-A22B good for coding?

Qwen3 235B-A22B is highly effective for coding tasks due to its large context length of 32,768 tokens and advanced language understanding capabilities.

Qwen3 235B-A22B vs Llama 3.1 8B?

Qwen3 235B-A22B has significantly more parameters (235B vs 8B) and a longer context length (32,768 vs typically 2,048), making it more powerful for complex tasks but requiring much more VRAM.

Can I run Qwen3 235B-A22B on a Mac?

Running Qwen3 235B-A22B on a Mac is challenging due to the high VRAM requirement. You would need a Mac with a powerful external GPU setup or consider cloud-based solutions.

How much VRAM does Qwen3 235B-A22B need?

Qwen3 235B-A22B requires 144 GB of VRAM, which can be achieved using multiple high-end GPUs like the NVIDIA A100 or H100.

Is Qwen3 235B-A22B censored?

Qwen3 235B-A22B is not inherently censored, but its responses can be filtered or moderated based on the implementation and usage policies set by the user or organization.

Is Qwen3 235B-A22B commercial-use allowed?

Yes, Qwen3 235B-A22B is licensed under the Apache-2.0 license, which allows for commercial use without additional restrictions.

Qwen3 235B-A22B context length?

Qwen3 235B-A22B has a context length of 32,768 tokens, allowing it to handle very long sequences of text effectively.

Want personalized recommendations for your exact setup? Detect my hardware →