Cloud GPU pricing
Live snapshot across 11 providers, sorted by VRAM tier and cheapest hourly. Refreshed periodically.
16 GB VRAM
Tiny inference, T4 / L4 / RTX A4000
24 GB VRAM
RTX 3090 / 4090 — sweet spot for 8–13 B models
| Provider | GPU | $/hr | Notes | Open |
|---|---|---|---|---|
| Salad | RTX 3090 | $0.10 | Distributed · Best $/hr | Rent → |
| Vast.ai | RTX 3090 | $0.15 | Cheapest 24 GB · Spot pricing | Rent → |
| Salad | RTX 4090 | $0.20 | Distributed compute | Rent → |
| RunPod | RTX 3090 | $0.22 | Community cloud · Cheapest 24 GB | Rent → |
| Vast.ai | RTX 4090 | $0.25 | Variable availability | Rent → |
| Cudo | RTX 4090 | $0.30 | Renewable-only DCs | Rent → |
| TensorDock | RTX 4090 | $0.34 | Marketplace · Per-minute billing | Rent → |
| RunPod | RTX 4090 | $0.44 | On-demand · Serverless available | Rent → |
| Modal | L4 | $0.80 | Serverless · Per-second billing | Rent → |
40 GB VRAM
A100 40 GB — fits 30 B at Q4
| Provider | GPU | $/hr | Notes | Open |
|---|---|---|---|---|
| Lambda | A100 40GB | $1.10 | 1-Click Cluster · Pre-installed CUDA | Rent → |
48 GB VRAM
A6000 / L40S — the 70 B Q4 sweet spot
80 GB VRAM
A100 80 GB / H100 — production / fine-tuning
| Provider | GPU | $/hr | Notes | Open |
|---|---|---|---|---|
| Vast.ai | A100 80GB | $0.90 | Cheapest A100 | Rent → |
| TensorDock | A100 80GB | $1.20 | Marketplace pricing | Rent → |
| Hyperstack | A100 80GB | $1.35 | EU bare-metal | Rent → |
| RunPod | A100 80GB | $1.64 | 80GB · Top performance | Rent → |
| Lambda | A100 80GB | $1.79 | No queue (reserved) | Rent → |
| Massed Compute | H100 | $1.95 | Reserved discounts | Rent → |
| Vast.ai | H100 SXM | $2.10 | Cheapest H100 | Rent → |
| Hyperstack | H100 | $2.40 | EU H100 | Rent → |
| Cudo | H100 PCIe | $2.45 | Renewable energy | Rent → |
| Crusoe | H100 | $2.65 | Reserved capacity | Rent → |
| Lambda | H100 SXM | $2.99 | NVLink | Rent → |
| Paperspace | A100 80GB | $3.18 | Persistent storage | Rent → |
| RunPod | H100 SXM | $3.29 | HBM3 · Fastest inference | Rent → |
| Modal | A100 80GB | $4.10 | Per-second · Auto-scaling | Rent → |
| Modal | H100 | $7.50 | Per-second | Rent → |
141 GB VRAM
H200 — biggest single-card
| Provider | GPU | $/hr | Notes | Open |
|---|---|---|---|---|
| RunPod | H200 | $4.49 | 141 GB HBM3e | Rent → |
Pay-per-token alternative
Skip the rental entirely — these providers run open-source models for you and bill by token.
Together AI
Open-weights API. Llama, Qwen, Mixtral, Flux.
In / 1M
$0.18
Out / 1M
$0.18
Highlighted: Llama 3.1 70B
Fireworks
Fastest open-model serving. Function calling.
In / 1M
$0.20
Out / 1M
$0.20
Highlighted: Llama 3.1 70B
Replicate
Run any model with one HTTP call.
In / 1M
$0.65
Out / 1M
$2.75
Highlighted: Flux.1 Schnell
Don’t need cloud? Check if your local hardware can run it →