CogVideoX from Tsinghua KEG is the best fit if you want video generation on a 16 GB card. Q8 quants bring memory under control without crushing quality.
| Quantization | Bits | File Size | VRAM Needed | RAM Needed | Quality |
|---|---|---|---|---|---|
| FP16 | 16 | 10 GB | 16 GB | 16 GB | 100% |
How to run CogVideoX 5B
Pick a runtime — copy & paste. Commands are pre-filled with this model’s repo.
GUI. Browse → download → chat. MLX on Apple Silicon.
LM Studio home →- 1
Open LM Studio
Go to the 🔍 Search tab.
- 2
Search for
THUDM/CogVideoX-5b - 3
Download
Pick the FP16 quant — best balance of size vs. quality.
- 4
Chat
Hit ▶ Load Model and start chatting. Toggle 'Local Server' to expose an OpenAI-compatible API on :1234.
Community benchmarks
Real seconds-per-image reports from people running CogVideoX 5B on actual hardware.
No community runs yet for this model. Be the first to submit your numbers.
how much VRAM do I need to run CogVideoX 5B?
CogVideoX 5B requires 16 GB VRAM minimum with FP16 quantization. For full precision you need 16 GB.
which quant should I pick?
Q4_K_M is the best quality/VRAM balance — ~92% of FP16 quality at ~25% the footprint. Q8_0 is near-lossless if you have the headroom.