Can M4 Max run Llama 3.1 70B (lorablated)?

Yes — runs locally

~17 tok/sec · Good — slight pause, then text streams smoothly.

Your VRAM

128 GB

Model size

70B

Best quant

Q4_K_M

VRAM needed

40.1 GB

The verdict

The M4 Max (128 GB VRAM) handles Llama 3.1 70B (lorablated) comfortably using the Q4_K_M quantization, which fits in 40.1 GB. Expected throughput is around 17 tokens/second, which feels Good — slight pause, then text streams smoothly. in interactive use. Llama-3.1-70B-Instruct with abliteration applied via LoRA merge. Cleanest 70B refusal-removed pick — keeps the official Instruct quality.

How to run it

1. Install Ollama or LM Studio.
2. Pull the Q4_K_M GGUF — best balance of quality and speed on 128 GB.
3. Start chatting. Expect ~17 tok/sec on first-token, faster after warmup.

See full Llama 3.1 70B (lorablated) setup →

Other models that run great on M4 Max

FAQ (20)

What GPU do I need to run Llama 3.1 70B (lorablated)?

To run Llama 3.1 70B (lorablated), you need a GPU with at least 40.1 GB of VRAM, but up to 140.5 GB depending on the quantization level. NVIDIA A100 or V100 GPUs are recommended.

Is Llama 3.1 70B (lorablated) good for coding?

Llama 3.1 70B (lorablated) is highly effective for coding tasks due to its large context length and advanced language understanding, making it suitable for code generation and debugging.

Llama 3.1 70B (lorablated) vs Llama 3.1 8B?

Llama 3.1 70B (lorablated) offers significantly better performance and more detailed responses compared to Llama 3.1 8B, but requires much more VRAM and computational resources.

Can I run Llama 3.1 70B (lorablated) on a Mac?

Running Llama 3.1 70B (lorablated) on a Mac is possible with an M1/M2 chip or an external GPU, but it may require additional setup and may not be as efficient as on a dedicated GPU system.

How much VRAM does Llama 3.1 70B (lorablated) need?

Llama 3.1 70B (lorablated) requires between 40.1 GB and 140.5 GB of VRAM, depending on the quantization level used.

Is Llama 3.1 70B (lorablated) censored?

Llama 3.1 70B (lorablated) has had refusal-removal applied, which means it is less likely to refuse to generate content, but it still adheres to ethical guidelines and content policies.

Is Llama 3.1 70B (lorablated) commercial-use allowed?

Yes, Llama 3.1 70B (lorablated) is licensed under the llama3.1 license, which allows commercial use, provided you comply with the terms of the license.

Llama 3.1 70B (lorablated) context length?

Llama 3.1 70B (lorablated) has a context length of 131,072 tokens, allowing it to process very long sequences of text.

Want personalized recommendations for your exact setup? Detect my hardware →