Compatible Models
109
Largest Model
Llama 3.1 70B Instruct (70B)
Categories
9
💬
Chat / LLM (47 models)
Llama 3.1 70B Instruct
Meta
142.0GB VRAMFP16
100%
Qwen 2.5 32B
Alibaba
19.0GB VRAMQ4_K_M
85%
Gemma 3 27B
15.9GB VRAMQ4_K_M
85%
Mistral Small 22B
Mistral AI
12.9GB VRAMQ4_K_M
85%
Phi-4
Microsoft
15.0GB VRAMQ8_0
98%
Qwen 2.5 14B
Alibaba
15.1GB VRAMQ8_0
98%
Gemma 3 12B
12.2GB VRAMQ8_0
98%
Mistral Nemo 12B
Mistral AI
12.6GB VRAMQ8_0
98%
Solar 10.7B
Upstage
11.1GB VRAMQ8_0
98%
Falcon 3 10B
TII
10.7GB VRAMQ8_0
98%
Gemma 2 9B Instruct
9.7GB VRAMQ8_0
98%
Yi 1.5 9B Chat
01.AI
9.2GB VRAMQ8_0
98%
DeepSeek R1 Distill 8B
DeepSeek
8.4GB VRAMQ8_0
98%
Llama 3.1 8B Instruct
Meta
17.0GB VRAMFP16
100%
Granite 3.3 8B
IBM
8.6GB VRAMQ8_0
98%
EXAONE 3.5 7.8B
LG AI
8.2GB VRAMQ8_0
98%
InternLM 2.5 7B
Shanghai AI Lab
8.2GB VRAMQ8_0
98%
Qwen 2.5 7B Instruct
Alibaba
9.0GB VRAMQ8_0
98%
Mistral 7B Instruct v0.3
Mistral AI
15.5GB VRAMFP16
100%
Falcon 3 7B
TII
8.3GB VRAMQ8_0
98%
OLMo 2 7B
Allen AI
7.7GB VRAMQ8_0
98%
OpenChat 3.5 7B
OpenChat
7.7GB VRAMQ8_0
98%
Yi 1.5 6B Chat
01.AI
6.5GB VRAMQ8_0
98%
Gemma 3 4B
4.3GB VRAMQ8_0
98%
Nemotron Mini 4B
NVIDIA
4.7GB VRAMQ8_0
98%
Danube 3 4B
H2O.ai
4.4GB VRAMQ8_0
98%
Phi-3.5 Mini 3.8B
Microsoft
4.3GB VRAMQ8_0
98%
Phi-4 Mini 3.8B
Microsoft
4.3GB VRAMQ8_0
98%
Llama 3.2 3B Instruct
Meta
3.7GB VRAMQ8_0
98%
Qwen 2.5 3B
Alibaba
3.9GB VRAMQ8_0
98%
Falcon 3 3B
TII
3.8GB VRAMQ8_0
98%
StableLM Zephyr 3B
Stability AI
3.3GB VRAMQ8_0
98%
Rocket 3B
Pansophic
3.3GB VRAMQ8_0
98%
Gemma 2 2B
3.1GB VRAMQ8_0
98%
EXAONE 3.5 2.4B
LG AI
3.1GB VRAMQ8_0
98%
Granite 3.3 2B
IBM
3.0GB VRAMQ8_0
98%
SmolLM2 1.7B
HuggingFace
2.2GB VRAMQ8_0
98%
Qwen 2.5 1.5B
Alibaba
2.3GB VRAMQ8_0
98%
DeepSeek R1 Distill 1.5B
DeepSeek
2.3GB VRAMQ8_0
98%
Llama 3.2 1B Instruct
Meta
2.8GB VRAMFP16
100%
TinyLlama 1.1B
TinyLlama
1.6GB VRAMQ8_0
98%
Gemma 3 1B
1.5GB VRAMQ8_0
98%
Falcon 3 1B
TII
2.2GB VRAMQ8_0
98%
Qwen 2.5 0.5B
Alibaba
1.1GB VRAMQ8_0
98%
Danube 3 500M
H2O.ai
1.0GB VRAMQ8_0
98%
SmolLM2 360M
HuggingFace
0.9GB VRAMQ8_0
98%
SmolLM2 135M
HuggingFace
0.8GB VRAMFP16
100%
💻
Coding (16 models)
Qwen 2.5 Coder 14B
Alibaba
15.1GB VRAMQ8_0
98%
Code Llama 13B Instruct
Meta
7.8GB VRAMQ4_K_M
85%
Yi Coder 9B
01.AI
9.2GB VRAMQ8_0
98%
CodeGemma 7B
8.9GB VRAMQ8_0
98%
Qwen 2.5 Coder 7B
Alibaba
8.0GB VRAMQ8_0
98%
StarCoder2 7B
BigCode
7.6GB VRAMQ8_0
98%
Code Llama 7B
Meta
7.2GB VRAMQ8_0
98%
DeepSeek Coder 6.7B
DeepSeek
7.2GB VRAMQ8_0
98%
Qwen 2.5 Coder 3B
Alibaba
3.9GB VRAMQ8_0
98%
StarCoder2 3B
BigCode
3.5GB VRAMQ8_0
98%
Stable Code 3B
Stability AI
3.3GB VRAMQ8_0
98%
CodeGemma 2B
3.0GB VRAMQ8_0
98%
Qwen 2.5 Coder 1.5B
Alibaba
2.3GB VRAMQ8_0
98%
Yi Coder 1.5B
01.AI
2.0GB VRAMQ8_0
98%
DeepSeek Coder 1.3B
DeepSeek
1.8GB VRAMQ8_0
98%
Qwen 2.5 Coder 0.5B
Alibaba
1.1GB VRAMQ8_0
98%
👁
Multimodal (6 models)
🎨
Image Generation (9 models)
FLUX.1 Schnell (GGUF)
Black Forest Labs
14.0GB VRAMQ5_0
90%
FLUX.1 Dev (GGUF)
Black Forest Labs
14.0GB VRAMQ5_0
100%
Stable Diffusion XL (CoreML)
Stability AI
3.3GB VRAMCoreML
100%
SDXL Turbo (GGUF)
Stability AI
5.0GB VRAMQ5_0
85%
Stable Diffusion 3 Medium (GGUF)
Stability AI
9.2GB VRAMQ8_0
95%
Stable Diffusion 2.1 Base (CoreML)
Stability AI / Apple
1.6GB VRAMCoreML-Palettized
85%
Stable Diffusion 1.5 (CoreML)
Runway
2.5GB VRAMCoreML-Palettized
90%
Stable Diffusion 1.5 (GGUF)
Runway / GPUStack
2.3GB VRAMQ8_0
95%
Stable Diffusion 2.1 (GGUF)
Stability AI
2.7GB VRAMQ8_0
95%
🎤
Speech Recognition (9 models)
Whisper Large v3
OpenAI
3.4GB VRAMQ8_0
98%
Whisper Large v3 Turbo
OpenAI
2.0GB VRAMQ8_0
95%
Whisper Medium
OpenAI
1.9GB VRAMQ8_0
92%
Distil-Whisper Large v3
HuggingFace
1.9GB VRAMQ8_0
96%
Whisper Small
OpenAI
0.9GB VRAMQ8_0
85%
Whisper Base
OpenAI
0.3GB VRAMQ8_0
80%
Whisper Base English
OpenAI
0.3GB VRAMQ8_0
82%
Whisper Tiny English (Quantized)
OpenAI
0.1GB VRAMQ5_1
65%
Whisper Tiny
OpenAI
0.2GB VRAMQ8_0
70%
🔊
Text to Speech (14 models)
Kokoro 82M TTS
Kokoro
0.6GB VRAMONNX-Q8F16
95%
Piper TTS - Amy (English)
Rhasspy
0.1GB VRAMONNX
85%
Piper TTS - Lessac (English)
Rhasspy
0.1GB VRAMONNX
85%
Piper TTS - LibriTTS-R (English)
Rhasspy
0.6GB VRAMONNX
80%
Piper TTS - Spanish (MLS)
Rhasspy
0.1GB VRAMONNX
80%
Piper TTS - French (Siwis)
Rhasspy
0.5GB VRAMONNX
80%
Piper TTS - German (Thorsten)
Rhasspy
0.1GB VRAMONNX
80%
Piper TTS - Chinese (Huayan)
Rhasspy
0.1GB VRAMONNX
80%
Piper TTS - Japanese (Kokoro)
Rhasspy
0.1GB VRAMONNX
80%
Piper TTS - Korean
Rhasspy
0.1GB VRAMONNX
80%
Piper TTS - Russian (Irina)
Rhasspy
0.1GB VRAMONNX
80%
Piper TTS - Portuguese (Faber)
Rhasspy
0.1GB VRAMONNX
80%
Piper TTS - Italian (Riccardo)
Rhasspy
0.5GB VRAMONNX
80%
Piper TTS - Arabic (Kareem)
Rhasspy
0.1GB VRAMONNX
80%
🎵
Audio Generation (1 model)
🧩
Embedding (5 models)
🔄
Reranker (2 models)
Compatible GPUs (4 with 192GB+ VRAM)
Frequently Asked Questions
What is the best AI model I can run with 192GB VRAM?
The largest model you can run with 192GB VRAM is Llama 3.1 70B Instruct (70B parameters) using FP16 quantization. There are 109 total compatible models.
Can I run Llama with 192GB VRAM?
Yes! You can run 7 Llama models with 192GB VRAM: Llama 3.1 70B Instruct (FP16), Code Llama 13B Instruct (Q4_K_M), Llama 3.1 8B Instruct (FP16), Code Llama 7B (Q8_0), Llama 3.2 3B Instruct (Q8_0), Llama 3.2 1B Instruct (FP16), TinyLlama 1.1B (Q8_0).
What GPU has 192GB VRAM?
GPUs with 192GB or more VRAM include: Apple M4 Ultra, Apple M3 Ultra, Apple M2 Ultra, AMD Instinct MI300X.