mlabonne
NeuralDaredevil 8B (abliterated)
Llama-3 8B with refusal direction ablated, then DPO-recovered to restore capability. Best quality-retention 8B abliteration — minimal regression vs the official Instruct model.
Check Your Hardware
See which quantizations of NeuralDaredevil 8B (abliterated) your hardware can run.
Quantization Options
| Quantization | Bits | File Size | VRAM Needed | RAM Needed | Quality |
|---|---|---|---|---|---|
| BF16 | 16 | 16 GB | 16.5 GB | 17 GB | 100% |
| Q4_K_M | 4.5 | 4.8 GB | 5.3 GB | 5.8 GB | 85% |
| Q8_0 | 8 | 8.48 GB | 8.98 GB | 9.48 GB | 98% |
Context window & KV cache
Adds 1.00 GB to VRAMLong chats and RAG inputs cost real memory. Drag to see how 32K vs 128K context shifts your grade.
Model native max: 8K tokens. KV-cache estimate is approximate (±30 %); real usage depends on attention layout.
How to run NeuralDaredevil 8B (abliterated)
Pick a runtime — copy & paste. Commands are pre-filled with this model’s repo.
GUI. Browse → download → chat. MLX on Apple Silicon.
LM Studio home →- 1
Open LM Studio
Go to the 🔍 Search tab.
- 2
Search for
bartowski/NeuralDaredevil-8B-abliterated-GGUF - 3
Download
Pick the Q4_K_M quant — best balance of size vs. quality.
- 4
Chat
Hit ▶ Load Model and start chatting. Toggle 'Local Server' to expose an OpenAI-compatible API on :1234.
Community benchmarks
Real tokens/sec reports from people running NeuralDaredevil 8B (abliterated) on actual hardware.
No community runs yet for this model. Be the first to submit your numbers.
Self-host serving plan
Want to host NeuralDaredevil 8B (abliterated)for many users? Or run it on a card that’s technically too small? Slide the knobs.
VRAM needed
6.5 GB
5.3 GB weights + 0.7 GB KV
Aggregate tok/s
31
across 1 user
Per-user tok/s
31
8 B dense
✅ Fits in 24 GB VRAM with 17.5 GB headroom. Pure-GPU inference — full speed.
Throughput is a sub-linear estimate: doubling users adds ~70 % of single-user TPS until ~8, then plateaus on memory bandwidth. MoE models scale concurrency much better because each user activates a different subset of experts.
See It In Action
Real model outputs generated via RunThisModel.com — watch responses stream in real time.
Outputs generated by real AI models via RunThisModel.com. Generation speed shown is from cloud inference. Local speeds vary by hardware — check your device.
Frequently Asked Questions
How much VRAM do I need to run NeuralDaredevil 8B (abliterated)?
NeuralDaredevil 8B (abliterated) requires 5.3GB VRAM minimum with BF16 quantization. For full precision, you need 16.5GB VRAM.
What is the best quantization for NeuralDaredevil 8B (abliterated)?
Q4_K_M offers the best balance of quality and VRAM usage. Q8_0 is near-lossless if you have enough VRAM.