Cloud GPU pricing

Live snapshot across 11 providers, sorted by VRAM tier and cheapest hourly. Refreshed periodically.

16 GB VRAM

Tiny inference, T4 / L4 / RTX A4000

Cheapest: $0.59/hr

Provider	GPU	$/hr	Notes	Open
Modal	T4	$0.59	Serverless · Scale to zero	Rent →
Paperspace	RTX A4000	$0.76	Gradient notebooks	Rent →

24 GB VRAM

RTX 3090 / 4090 — sweet spot for 8–13 B models

Cheapest: $0.10/hr

Provider	GPU	$/hr	Notes	Open
Salad	RTX 3090	$0.10	Distributed · Best $/hr	Rent →
Vast.ai	RTX 3090	$0.15	Cheapest 24 GB · Spot pricing	Rent →
Salad	RTX 4090	$0.20	Distributed compute	Rent →
RunPod	RTX 3090	$0.22	Community cloud · Cheapest 24 GB	Rent →
Vast.ai	RTX 4090	$0.25	Variable availability	Rent →
Cudo	RTX 4090	$0.30	Renewable-only DCs	Rent →
TensorDock	RTX 4090	$0.34	Marketplace · Per-minute billing	Rent →
RunPod	RTX 4090	$0.44	On-demand · Serverless available	Rent →
Modal	L4	$0.80	Serverless · Per-second billing	Rent →

40 GB VRAM

A100 40 GB — fits 30 B at Q4

Cheapest: $1.10/hr

Provider	GPU	$/hr	Notes	Open
Lambda	A100 40GB	$1.10	1-Click Cluster · Pre-installed CUDA	Rent →

48 GB VRAM

A6000 / L40S — the 70 B Q4 sweet spot

Cheapest: $0.31/hr

Provider	GPU	$/hr	Notes	Open
Massed Compute	RTX A6000	$0.31	LLM templates · Cheap A6000	Rent →
Hyperstack	RTX A6000	$0.50	EU regions · Reserved discounts	Rent →
RunPod	RTX A6000	$0.76	48GB VRAM · Great for 70B	Rent →
RunPod	L40S	$0.99	Ada Lovelace · Inference-tuned	Rent →
Crusoe	L40S	$1.45	Flared-gas powered	Rent →

80 GB VRAM

A100 80 GB / H100 — production / fine-tuning

Cheapest: $0.90/hr

Provider	GPU	$/hr	Notes	Open
Vast.ai	A100 80GB	$0.90	Cheapest A100	Rent →
TensorDock	A100 80GB	$1.20	Marketplace pricing	Rent →
Hyperstack	A100 80GB	$1.35	EU bare-metal	Rent →
RunPod	A100 80GB	$1.64	80GB · Top performance	Rent →
Lambda	A100 80GB	$1.79	No queue (reserved)	Rent →
Massed Compute	H100	$1.95	Reserved discounts	Rent →
Vast.ai	H100 SXM	$2.10	Cheapest H100	Rent →
Hyperstack	H100	$2.40	EU H100	Rent →
Cudo	H100 PCIe	$2.45	Renewable energy	Rent →
Crusoe	H100	$2.65	Reserved capacity	Rent →
Lambda	H100 SXM	$2.99	NVLink	Rent →
Paperspace	A100 80GB	$3.18	Persistent storage	Rent →
RunPod	H100 SXM	$3.29	HBM3 · Fastest inference	Rent →
Modal	A100 80GB	$4.10	Per-second · Auto-scaling	Rent →
Modal	H100	$7.50	Per-second	Rent →

141 GB VRAM

H200 — biggest single-card

Cheapest: $4.49/hr

Provider	GPU	$/hr	Notes	Open
RunPod	H200	$4.49	141 GB HBM3e	Rent →

Pay-per-token alternative

Skip the rental entirely — these providers run open-source models for you and bill by token.

Together AI

Open-weights API. Llama, Qwen, Mixtral, Flux.

In / 1M

$0.18

Out / 1M

$0.18

Highlighted: Llama 3.1 70B

Fireworks

Fastest open-model serving. Function calling.

In / 1M

$0.20

Out / 1M

$0.20

Highlighted: Llama 3.1 70B

Replicate

Run any model with one HTTP call.

In / 1M

$0.65

Out / 1M

$2.75

Highlighted: Flux.1 Schnell

Don’t need cloud? Check if your local hardware can run it →