Apple M2 Max vs Apple M2 Ultra

Head-to-head AI inference comparison across 145 popular models. Each model is graded against both cards using its highest-quality quantization that still fits in VRAM. Bigger grade and faster tokens-per-second wins.

Spec

VRAM
Architecture
Vendor
MSRP
Models running
Wins (grade)

96GB
m2
apple
—
145 of 145
0 models

192GB
m2
apple
—
145 of 145
3 models

Where Apple M2 Max pulls ahead

No standout wins.

Where Apple M2 Ultra pulls ahead

Language Models74 models

Max

tok/s

Model

tok/s

Ultra

Mixtral 8x22B Instruct

141B · Mistral AI

Qwen3 235B-A22B

235B · Alibaba

72B · Anthracite

Llama 3.1 70B Instruct

Euryale L3.3 70B v2.3

Llama 3.1 70B (lorablated)

70B · mlabonne

Mixtral 8x7B Instruct

46.7B · Mistral AI

41.9B · Microsoft

Skyfall 31B v4.2

31B · TheDrummer

30.5B · Alibaba

Dolphin Mistral 24B (Venice Edition)

24B · Cognitive Computations

Dolphin 3.0 R1 Mistral 24B

24B · Cognitive Computations

Cydonia 24B v4.3

24B · TheDrummer

Mistral Small 22B

22B · Mistral AI

22B · Anthracite

DeepSeek MoE 16B

16.4B · DeepSeek

Rocinante XL 16B v1

16B · TheDrummer

14B · Microsoft

Mistral Nemo 12B

12B · Mistral AI

12B · Anthracite

Rocinante 12B v1.1

12B · TheDrummer

Mistral Nemo Base 12B

12B · Mistral AI

10.7B · Upstage

Gemma 2 9B Instruct

DeepSeek R1 Distill 8B

Llama 3.1 8B Instruct

Dolphin 3.0 Llama 3.1 8B

8B · Cognitive Computations

NeuralDaredevil 8B (abliterated)

Llama 3.1 8B Instruct (abliterated)

Stheno L3 8B v3.2

EXAONE 3.5 7.8B

InternLM 2.5 7B

7.7B · Shanghai AI Lab

Qwen 2.5 7B Instruct

7.6B · Alibaba

Mistral 7B Instruct v0.3

7.3B · Mistral AI

OpenChat 3.5 7B

Nemotron Mini 4B

Phi-3.5 Mini 3.8B

3.8B · Microsoft

Phi-4 Mini 3.8B

3.8B · Microsoft

Granite 3.0 3B-A800M

Llama 3.2 3B Instruct

StableLM Zephyr 3B

3B · Stability AI

3B · Pansophic

EXAONE 3.5 2.4B

1.7B · HuggingFace

1.5B · Alibaba

DeepSeek R1 Distill 1.5B

1.5B · DeepSeek

Granite 3.0 1B-A400M

Llama 3.2 1B Instruct

1.1B · TinyLlama

0.5B · Alibaba

0.36B · HuggingFace

0.135B · HuggingFace

Code Models17 models

Max

tok/s

Model

tok/s

Ultra

Codestral 22B (abliterated)

Qwen 2.5 Coder 14B

Code Llama 13B Instruct

Qwen 2.5 Coder 7B

7.6B · Alibaba

DeepSeek Coder 6.7B

6.7B · DeepSeek

Qwen 2.5 Coder 3B

3B · Stability AI

Qwen 2.5 Coder 1.5B

1.5B · Alibaba

DeepSeek Coder 1.3B

1.3B · DeepSeek

Qwen 2.5 Coder 0.5B

0.5B · Alibaba

Multimodal & Vision6 models

Max

tok/s

Model

tok/s

Ultra

4.2B · Microsoft

2.2B · Alibaba

1.8B · Moondream

Image Generation9 models

Max

tok/s

Model

tok/s

Ultra

FLUX.1 Schnell (GGUF)

12B · Black Forest Labs

FLUX.1 Dev (GGUF)

12B · Black Forest Labs

Stable Diffusion XL (CoreML)

3.5B · Stability AI

SDXL Turbo (GGUF)

3.5B · Stability AI

Stable Diffusion 3 Medium (GGUF)

2.5B · Stability AI

Stable Diffusion 2.1 Base (CoreML)

0.86B · Stability AI / Apple

Stable Diffusion 1.5 (CoreML)

0.86B · Runway

Stable Diffusion 1.5 (GGUF)

0.86B · Runway / GPUStack

Stable Diffusion 2.1 (GGUF)

0.86B · Stability AI

Speech9 models

Max

tok/s

Model

tok/s

Ultra

Whisper Large v3

1.55B · OpenAI

Whisper Large v3 Turbo

0.81B · OpenAI

0.77B · OpenAI

Distil-Whisper Large v3

0.76B · HuggingFace

0.24B · OpenAI

0.074B · OpenAI

Whisper Base English

0.074B · OpenAI

Whisper Tiny English (Quantized)

0.039B · OpenAI

0.039B · OpenAI

Text-to-Speech14 models

Max

tok/s

Model

tok/s

Ultra

0.082B · Kokoro

Piper TTS - Amy (English)

0.02B · Rhasspy

Piper TTS - Lessac (English)

0.02B · Rhasspy

Piper TTS - LibriTTS-R (English)

0.02B · Rhasspy

Piper TTS - Spanish (MLS)

0.02B · Rhasspy

Piper TTS - French (Siwis)

0.02B · Rhasspy

Piper TTS - German (Thorsten)

0.02B · Rhasspy

Piper TTS - Chinese (Huayan)

0.02B · Rhasspy

Piper TTS - Japanese (Kokoro)

0.02B · Rhasspy

Piper TTS - Korean

0.02B · Rhasspy

Piper TTS - Russian (Irina)

0.02B · Rhasspy

Piper TTS - Portuguese (Faber)

0.02B · Rhasspy

Piper TTS - Italian (Riccardo)

0.02B · Rhasspy

Piper TTS - Arabic (Kareem)

0.02B · Rhasspy

Embeddings5 models

Max

tok/s

Model

tok/s

Ultra

BGE Large EN v1.5

Nomic Embed Text v1.5

0.137B · Nomic AI

BGE Small EN v1.5

Snowflake Arctic Embed S

0.033B · Snowflake

all-MiniLM-L6-v2

0.023B · Sentence Transformers

Rerankers2 models

Max

tok/s

Model

tok/s

Ultra

BGE Reranker v2 M3

Jina Reranker Tiny EN

0.033B · Jina AI

tools:semantic search new models gguf fit check gpu benchmark leaderboard power cost multi-gpu planner finetune planner pc builder run now uncensored compare grid vram tiers

© runthismodel · 2026privacy terms disclaimer contact editorial standards changelog embed badge runpod vast.ai huggingface ollama lm-studiomade for the people who actually read GGUF metadata

 ┌─┐                ╔══╗     ╔══╗
 │░│  RUN  THIS  M  ║▓▓║ DEL ║▓▓║
 └─┘                ╚══╝     ╚══╝