Back to RunThisModel

AI Models You Can Run with 192GB VRAM

109 models compatible across 9 categories

Compatible Models
109
Largest Model
Llama 3.1 70B Instruct (70B)
Categories
9
💬

Chat / LLM (47 models)

Llama 3.1 70B Instruct

Meta

70B
142.0GB VRAMFP16
100%

Qwen 2.5 32B

Alibaba

32B
19.0GB VRAMQ4_K_M
85%

Gemma 3 27B

Google

27B
15.9GB VRAMQ4_K_M
85%

Mistral Small 22B

Mistral AI

22B
12.9GB VRAMQ4_K_M
85%

Phi-4

Microsoft

14B
15.0GB VRAMQ8_0
98%

Qwen 2.5 14B

Alibaba

14B
15.1GB VRAMQ8_0
98%

Gemma 3 12B

Google

12B
12.2GB VRAMQ8_0
98%

Mistral Nemo 12B

Mistral AI

12B
12.6GB VRAMQ8_0
98%

Solar 10.7B

Upstage

10.7B
11.1GB VRAMQ8_0
98%

Falcon 3 10B

TII

10B
10.7GB VRAMQ8_0
98%

Gemma 2 9B Instruct

Google

9.2B
9.7GB VRAMQ8_0
98%

Yi 1.5 9B Chat

01.AI

9B
9.2GB VRAMQ8_0
98%

DeepSeek R1 Distill 8B

DeepSeek

8B
8.4GB VRAMQ8_0
98%

Llama 3.1 8B Instruct

Meta

8B
17.0GB VRAMFP16
100%

Granite 3.3 8B

IBM

8B
8.6GB VRAMQ8_0
98%

EXAONE 3.5 7.8B

LG AI

7.8B
8.2GB VRAMQ8_0
98%

InternLM 2.5 7B

Shanghai AI Lab

7.7B
8.2GB VRAMQ8_0
98%

Qwen 2.5 7B Instruct

Alibaba

7.6B
9.0GB VRAMQ8_0
98%

Mistral 7B Instruct v0.3

Mistral AI

7.3B
15.5GB VRAMFP16
100%

Falcon 3 7B

TII

7B
8.3GB VRAMQ8_0
98%

OLMo 2 7B

Allen AI

7B
7.7GB VRAMQ8_0
98%

OpenChat 3.5 7B

OpenChat

7B
7.7GB VRAMQ8_0
98%

Yi 1.5 6B Chat

01.AI

6B
6.5GB VRAMQ8_0
98%

Gemma 3 4B

Google

4B
4.3GB VRAMQ8_0
98%

Nemotron Mini 4B

NVIDIA

4B
4.7GB VRAMQ8_0
98%

Danube 3 4B

H2O.ai

4B
4.4GB VRAMQ8_0
98%

Phi-3.5 Mini 3.8B

Microsoft

3.8B
4.3GB VRAMQ8_0
98%

Phi-4 Mini 3.8B

Microsoft

3.8B
4.3GB VRAMQ8_0
98%

Llama 3.2 3B Instruct

Meta

3.2B
3.7GB VRAMQ8_0
98%

Qwen 2.5 3B

Alibaba

3B
3.9GB VRAMQ8_0
98%

Falcon 3 3B

TII

3B
3.8GB VRAMQ8_0
98%

StableLM Zephyr 3B

Stability AI

3B
3.3GB VRAMQ8_0
98%

Rocket 3B

Pansophic

3B
3.3GB VRAMQ8_0
98%

Gemma 2 2B

Google

2.6B
3.1GB VRAMQ8_0
98%

EXAONE 3.5 2.4B

LG AI

2.4B
3.1GB VRAMQ8_0
98%

Granite 3.3 2B

IBM

2B
3.0GB VRAMQ8_0
98%

SmolLM2 1.7B

HuggingFace

1.7B
2.2GB VRAMQ8_0
98%

Qwen 2.5 1.5B

Alibaba

1.5B
2.3GB VRAMQ8_0
98%

DeepSeek R1 Distill 1.5B

DeepSeek

1.5B
2.3GB VRAMQ8_0
98%

Llama 3.2 1B Instruct

Meta

1.24B
2.8GB VRAMFP16
100%

TinyLlama 1.1B

TinyLlama

1.1B
1.6GB VRAMQ8_0
98%

Gemma 3 1B

Google

1B
1.5GB VRAMQ8_0
98%

Falcon 3 1B

TII

1B
2.2GB VRAMQ8_0
98%

Qwen 2.5 0.5B

Alibaba

0.5B
1.1GB VRAMQ8_0
98%

Danube 3 500M

H2O.ai

0.5B
1.0GB VRAMQ8_0
98%

SmolLM2 360M

HuggingFace

0.36B
0.9GB VRAMQ8_0
98%

SmolLM2 135M

HuggingFace

0.135B
0.8GB VRAMFP16
100%
💻

Coding (16 models)

👁

Multimodal (6 models)

🎨

Image Generation (9 models)

🎤

Speech Recognition (9 models)

🔊

Text to Speech (14 models)

🎵

Audio Generation (1 model)

🧩

Embedding (5 models)

🔄

Reranker (2 models)

Compatible GPUs (4 with 192GB+ VRAM)

Frequently Asked Questions

What is the best AI model I can run with 192GB VRAM?
The largest model you can run with 192GB VRAM is Llama 3.1 70B Instruct (70B parameters) using FP16 quantization. There are 109 total compatible models.
Can I run Llama with 192GB VRAM?
Yes! You can run 7 Llama models with 192GB VRAM: Llama 3.1 70B Instruct (FP16), Code Llama 13B Instruct (Q4_K_M), Llama 3.1 8B Instruct (FP16), Code Llama 7B (Q8_0), Llama 3.2 3B Instruct (Q8_0), Llama 3.2 1B Instruct (FP16), TinyLlama 1.1B (Q8_0).
What GPU has 192GB VRAM?
GPUs with 192GB or more VRAM include: Apple M4 Ultra, Apple M3 Ultra, Apple M2 Ultra, AMD Instinct MI300X.

Other VRAM Tiers