Cloud GPU pricing
Reference snapshot across 11 providers, sorted by VRAM tier and hourly rate. Confirm current availability and pricing with the provider before renting.
Some outbound provider links are affiliate links. They may fund RunThisModel at no extra cost to you, and they do not change local compatibility grades or reference rows.
16 GB VRAM
Tiny inference, T4 / L4 / RTX A4000
| Provider | GPU | $/hr | Notes | Open |
|---|---|---|---|---|
| Modal | T4 | $0.59 | Serverless · Scale to zero | Reference |
| Paperspace | RTX A4000 | $0.76 | Gradient notebooks | Reference |
24 GB VRAM
RTX 3090 / 4090 — sweet spot for 8–13 B models
| Provider | GPU | $/hr | Notes | Open |
|---|---|---|---|---|
| Salad | RTX 3090 | $0.10 | Distributed · Best $/hr | Reference |
| Vast.ai | RTX 3090 | $0.15 | Cheapest 24 GB · Spot pricing | Rent → |
| Salad | RTX 4090 | $0.20 | Distributed compute | Reference |
| RunPod | RTX 3090 | $0.22 | Community cloud · Cheapest 24 GB | Rent → |
| Vast.ai | RTX 4090 | $0.25 | Variable availability | Rent → |
| Cudo | RTX 4090 | $0.30 | Renewable-only DCs | Reference |
| TensorDock | RTX 4090 | $0.34 | Marketplace · Per-minute billing | Reference |
| RunPod | RTX 4090 | $0.44 | On-demand · Serverless available | Rent → |
| Modal | L4 | $0.80 | Serverless · Per-second billing | Reference |
40 GB VRAM
A100 40 GB — fits 30 B at Q4
| Provider | GPU | $/hr | Notes | Open |
|---|---|---|---|---|
| Lambda | A100 40GB | $1.10 | 1-Click Cluster · Pre-installed CUDA | Reference |
48 GB VRAM
A6000 / L40S — the 70 B Q4 sweet spot
80 GB VRAM
A100 80 GB / H100 — production / fine-tuning
| Provider | GPU | $/hr | Notes | Open |
|---|---|---|---|---|
| Vast.ai | A100 80GB | $0.90 | Cheapest A100 | Rent → |
| TensorDock | A100 80GB | $1.20 | Marketplace pricing | Reference |
| Hyperstack | A100 80GB | $1.60 | EU bare-metal | Reference |
| RunPod | A100 80GB | $1.64 | 80GB · Top performance | Rent → |
| Lambda | A100 80GB | $1.79 | No queue (reserved) | Reference |
| Massed Compute | H100 | $1.95 | Reserved discounts | Reference |
| Vast.ai | H100 SXM | $2.10 | Cheapest H100 | Rent → |
| Hyperstack | H100 | $2.40 | EU H100 | Reference |
| Cudo | H100 PCIe | $2.45 | Renewable energy | Reference |
| Crusoe | H100 | $2.65 | Reserved capacity | Reference |
| Lambda | H100 SXM | $2.99 | NVLink | Reference |
| Paperspace | A100 80GB | $3.18 | Persistent storage | Reference |
| RunPod | H100 SXM | $3.29 | HBM3 · Fastest inference | Rent → |
| Modal | A100 80GB | $4.10 | Per-second · Auto-scaling | Reference |
| Modal | H100 | $7.50 | Per-second | Reference |
141 GB VRAM
H200 — biggest single-card
| Provider | GPU | $/hr | Notes | Open |
|---|---|---|---|---|
| RunPod | H200 | $4.49 | 141 GB HBM3e | Rent → |
Pay-per-token alternative
Skip the rental entirely — these providers run open-source models for you and bill by token.
Together AI
Open-weights API. Llama, Qwen, Mixtral, Flux.
In / 1M
$0.18
Out / 1M
$0.18
Highlighted: Llama 3.1 70B
Fireworks
Fastest open-model serving. Function calling.
In / 1M
$0.20
Out / 1M
$0.20
Highlighted: Llama 3.1 70B
Replicate
Run any model with one HTTP call.
In / 1M
$0.65
Out / 1M
$2.75
Highlighted: Flux.1 Schnell
Don’t need cloud? Check if your local hardware can run it →