Can I run Cydonia 24B v4.3 on my device?

Cydonia 24B v4.3 requires a minimum of 14.9GB VRAM. Use RunThisModel to check your specific hardware compatibility and find the best quantization for your device.

How much VRAM does Cydonia 24B v4.3 need?

Cydonia 24B v4.3 needs 14.9GB VRAM at minimum (BF16 quantization). Higher quality quantizations need more: BF16: 48.5GB, Q4_K_M: 14.9GB.

How do I download Cydonia 24B v4.3?

You can download Cydonia 24B v4.3 in GGUF format from HuggingFace (14.4GB minimum). Use the RunThisModel iOS app to download and run it directly on your device, or download manually from HuggingFace.

Can Cydonia 24B v4.3 run on iPhone?

Cydonia 24B v4.3 at 24B parameters is too large for most iPhones. Consider using an iPad with M-series chip or Mac with Apple Silicon.

TheDrummer

Cydonia 24B v4.3

Name: Cydonia 24B v4.3
Author: TheDrummer

Top-of-line 24B roleplay model, Mistral-Small-3.2-24B base. Active development cycle — TheDrummer ships frequent revisions tracking the latest base models.

24B parametersmistralother32K context14.9GB - 48.5GB VRAM

Check Your Hardware

See which quantizations of Cydonia 24B v4.3 your hardware can run.

Quantization Options

Quantization	Bits	File Size	VRAM Needed	RAM Needed	Quality
BF16	16	48 GB	48.5 GB	49 GB	100%
Q4_K_M	4.5	14.4 GB	14.9 GB	15.4 GB	85%

Context window & KV cache

Adds 1.50 GB to VRAM

Long chats and RAG inputs cost real memory. Drag to see how 32K vs 128K context shifts your grade.

Model native max: 32K tokens. KV-cache estimate is approximate (±30 %); real usage depends on attention layout.

How to run Cydonia 24B v4.3

Pick a runtime — copy & paste. Commands are pre-filled with this model’s repo.

GUI. Browse → download → chat. MLX on Apple Silicon.

LM Studio home →

1
Open LM Studio
Go to the 🔍 Search tab.
2
Search for
```
mradermacher/Cydonia-24B-v4.3-GGUF
```
3
Download
Pick the Q4_K_M quant — best balance of size vs. quality.
4
Chat
Hit ▶ Load Model and start chatting. Toggle 'Local Server' to expose an OpenAI-compatible API on :1234.

Community benchmarks

Real tokens/sec reports from people running Cydonia 24B v4.3 on actual hardware.

No community runs yet for this model. Be the first to submit your numbers.

Self-host serving plan

Want to host Cydonia 24B v4.3for many users? Or run it on a card that’s technically too small? Slide the knobs.

Concurrent users: 1VRAM budget: 24 GBSystem RAM: 64 GBAllow NVMe spilloverPermit weights to spill to NVMe SSD (last resort, ~50× slower)

VRAM needed

16.6 GB

14.9 GB weights + 1.2 GB KV

Aggregate tok/s

across 1 user

Per-user tok/s

24 B dense

✅ Fits in 24 GB VRAM with 7.4 GB headroom. Pure-GPU inference — full speed.

Throughput is a sub-linear estimate: doubling users adds ~70 % of single-user TPS until ~8, then plateaus on memory bandwidth. MoE models scale concurrency much better because each user activates a different subset of experts.

Download & Run

HuggingFace

View model & download weights

Ollama

One-command install & run

See It In Action

Real model outputs generated via RunThisModel.com — watch responses stream in real time.

Llama 3.3 70B responding...

Outputs generated by real AI models via RunThisModel.com. Generation speed shown is from cloud inference. Local speeds vary by hardware — check your device.

Frequently Asked Questions

How much VRAM do I need to run Cydonia 24B v4.3?

Cydonia 24B v4.3 requires 14.9GB VRAM minimum with BF16 quantization. For full precision, you need 48.5GB VRAM.

What is the best quantization for Cydonia 24B v4.3?

Q4_K_M offers the best balance of quality and VRAM usage. Q8_0 is near-lossless if you have enough VRAM.