~/runthismodel
daemon okbuild 5a3c91d00:00:00Z

Cloud GPU pricing

Reference snapshot across 11 providers, sorted by VRAM tier and hourly rate. Confirm current availability and pricing with the provider before renting.

Some outbound provider links are affiliate links. They may fund RunThisModel at no extra cost to you, and they do not change local compatibility grades or reference rows.

16 GB VRAM

Tiny inference, T4 / L4 / RTX A4000

Reference low: $0.59/hr
ProviderGPU$/hrNotesOpen
ModalT4$0.59Serverless · Scale to zeroReference
PaperspaceRTX A4000$0.76Gradient notebooksReference

24 GB VRAM

RTX 3090 / 4090 — sweet spot for 8–13 B models

Reference low: $0.10/hr
ProviderGPU$/hrNotesOpen
SaladRTX 3090$0.10Distributed · Best $/hrReference
Vast.aiRTX 3090$0.15Cheapest 24 GB · Spot pricingRent →
SaladRTX 4090$0.20Distributed computeReference
RunPodRTX 3090$0.22Community cloud · Cheapest 24 GBRent →
Vast.aiRTX 4090$0.25Variable availabilityRent →
CudoRTX 4090$0.30Renewable-only DCsReference
TensorDockRTX 4090$0.34Marketplace · Per-minute billingReference
RunPodRTX 4090$0.44On-demand · Serverless availableRent →
ModalL4$0.80Serverless · Per-second billingReference

40 GB VRAM

A100 40 GB — fits 30 B at Q4

Reference low: $1.10/hr
ProviderGPU$/hrNotesOpen
LambdaA100 40GB$1.101-Click Cluster · Pre-installed CUDAReference

48 GB VRAM

A6000 / L40S — the 70 B Q4 sweet spot

Reference low: $0.31/hr
ProviderGPU$/hrNotesOpen
Massed ComputeRTX A6000$0.31LLM templates · Cheap A6000Reference
HyperstackRTX A6000$0.50EU regions · Reserved discountsReference
RunPodRTX A6000$0.7648GB VRAM · Great for 70BRent →
RunPodL40S$0.99Ada Lovelace · Inference-tunedRent →
CrusoeL40S$1.45Flared-gas poweredReference

80 GB VRAM

A100 80 GB / H100 — production / fine-tuning

Reference low: $0.90/hr
ProviderGPU$/hrNotesOpen
Vast.aiA100 80GB$0.90Cheapest A100Rent →
TensorDockA100 80GB$1.20Marketplace pricingReference
HyperstackA100 80GB$1.60EU bare-metalReference
RunPodA100 80GB$1.6480GB · Top performanceRent →
LambdaA100 80GB$1.79No queue (reserved)Reference
Massed ComputeH100$1.95Reserved discountsReference
Vast.aiH100 SXM$2.10Cheapest H100Rent →
HyperstackH100$2.40EU H100Reference
CudoH100 PCIe$2.45Renewable energyReference
CrusoeH100$2.65Reserved capacityReference
LambdaH100 SXM$2.99NVLinkReference
PaperspaceA100 80GB$3.18Persistent storageReference
RunPodH100 SXM$3.29HBM3 · Fastest inferenceRent →
ModalA100 80GB$4.10Per-second · Auto-scalingReference
ModalH100$7.50Per-secondReference

141 GB VRAM

H200 — biggest single-card

Reference low: $4.49/hr
ProviderGPU$/hrNotesOpen
RunPodH200$4.49141 GB HBM3eRent →

Pay-per-token alternative

Skip the rental entirely — these providers run open-source models for you and bill by token.

Together AI

Open-weights API. Llama, Qwen, Mixtral, Flux.

In / 1M

$0.18

Out / 1M

$0.18

Highlighted: Llama 3.1 70B

Fireworks

Fastest open-model serving. Function calling.

In / 1M

$0.20

Out / 1M

$0.20

Highlighted: Llama 3.1 70B

Replicate

Run any model with one HTTP call.

In / 1M

$0.65

Out / 1M

$2.75

Highlighted: Flux.1 Schnell

Don’t need cloud? Check if your local hardware can run it →