Cloud API · Pay-per-token
Hosted AI API price comparison
Don’t want to babysit a GPU? Pay-per-token APIs across 11 providers, ranked by output-token price. The same model often costs 938× more on closed-weight providers — pick wisely.
11 providers
OpenAI
Closed onlyGPT-4o, GPT-4o mini, o1, o3-mini. Closed-weight only.
4 models · cheapest output: $0.60/M
Anthropic
Closed onlyClaude Sonnet 4.6 / Haiku 4 / Opus 4. Closed-weight only.
3 models · cheapest output: $5.00/M
Google Gemini
Closed onlyGemini 2.5 Pro / Flash / Nano. Closed-weight + Gemma open variants.
3 models · cheapest output: $0.15/M
DeepSeek
Open weightsDeepSeek V3 + R1. Open-weight, dramatically cheap.
3 models · cheapest output: $1.10/M
Groq
Open weightsLPU-accelerated open-weight serving. Fastest tok/s on the market.
4 models · cheapest output: $0.08/M
Cerebras
Open weightsWafer-scale chips. ~2,000 tok/s on Llama 70B — actual fastest.
3 models · cheapest output: $0.10/M
Together AI
Open weightsBroadest open-weight catalog: 100+ models, Flux/SD-3 image gen.
5 models · cheapest output: $0.18/M
Fireworks AI
Open weightsOpen-weight serving with function-calling. Strong fine-tune workflow.
3 models · cheapest output: $0.20/M
Replicate
Open weightsRun any HF model with one HTTP call. Pay per second.
3 models · cheapest output: $2.75/M
OpenRouter
Open weightsAggregator — single API key, 200+ models, automatic failover.
3 models · cheapest output: $0.80/M
Mistral
Open weightsFirst-party Mistral and Codestral. EU-hosted.
3 models · cheapest output: $0.60/M
All models — sorted cheapest output first
“Example call” = 500 input tokens + 200 output tokens (a typical chat exchange). 1¢ = 100 calls of this size at the marker.
| Provider | Model | $/1M in | $/1M out | Example call | Type |
|---|---|---|---|---|---|
| Groq | Llama 3.1 8B | $0.05 | $0.08 | $0.00004 | open |
| Cerebras | Llama 3.1 8B | $0.10 | $0.10 | $0.00007 | open |
| Google Gemini | Gemini 2.5 Nano | $0.04 | $0.15 | $0.00005 | closed |
| Together AI | Llama 3.1 8B | $0.18 | $0.18 | $0.00013 | open |
| Fireworks AI | Llama 3.1 8B | $0.20 | $0.20 | $0.00014 | open |
| Groq | Mixtral 8x7B | $0.24 | $0.24 | $0.00017 | open |
| Google Gemini | Gemini 2.5 Flash | $0.07 | $0.30 | $0.00010 | closed |
| OpenAI | GPT-4o mini | $0.15 | $0.60 | $0.00019 | closed |
| Together AI | Mixtral 8x7B | $0.60 | $0.60 | $0.00042 | open |
| Mistral | Mistral Small 24B | $0.20 | $0.60 | $0.00022 | open |
| Groq | Llama 3.3 70B | $0.59 | $0.79 | $0.00045 | open |
| Groq | Qwen 2.5 32B | $0.79 | $0.79 | $0.00055 | open |
| Cerebras | Qwen 3 32B | $0.40 | $0.80 | $0.00036 | open |
| Together AI | Qwen 3 32B | $0.40 | $0.80 | $0.00036 | open |
| OpenRouter | Llama 3.3 70B via OR | $0.60 | $0.80 | $0.00046 | open |
| Together AI | Llama 3.3 70B | $0.88 | $0.88 | $0.00062 | open |
| Fireworks AI | Llama 3.3 70B | $0.90 | $0.90 | $0.00063 | open |
| Fireworks AI | DeepSeek V3 | $0.90 | $0.90 | $0.00063 | open |
| Mistral | Codestral 25.01 | $0.30 | $0.90 | $0.00033 | open |
| DeepSeek | DeepSeek V3 | $0.27 | $1.10 | $0.00036 | open |
| DeepSeek | DeepSeek V3 (cache) | $0.07 | $1.10 | $0.00026 | open |
| OpenRouter | DeepSeek V3 via OR | $0.27 | $1.10 | $0.00036 | open |
| Cerebras | Llama 3.3 70B | $0.85 | $1.20 | $0.00067 | open |
| OpenRouter | Best LLM (auto-route) | $0.30 | $1.20 | $0.00039 | open |
| DeepSeek | DeepSeek R1 | $0.55 | $2.19 | $0.00071 | open |
| Replicate | Llama 3 70B | $0.65 | $2.75 | $0.00088 | open |
| Replicate | FLUX.1 Schnell (img) | — | $3.00 | $0.00060 | open |
| OpenAI | o3-mini | $1.10 | $4.40 | $0.00143 | closed |
| Anthropic | Claude Haiku 4 | $1.00 | $5.00 | $0.00150 | closed |
| Google Gemini | Gemini 2.5 Pro | $1.25 | $5.00 | $0.00163 | closed |
| Mistral | Mistral Large | $2.00 | $6.00 | $0.00220 | open |
| OpenAI | GPT-4o | $2.50 | $10.00 | $0.00325 | closed |
| Anthropic | Claude Sonnet 4.6 | $3.00 | $15.00 | $0.00450 | closed |
| Together AI | FLUX.1 Schnell (img) | — | $27.00 | $0.00540 | open |
| Replicate | FLUX.1 Pro (img) | — | $55.00 | $0.01100 | open |
| OpenAI | o1 | $15.00 | $60.00 | $0.01950 | closed |
| Anthropic | Claude Opus 4 | $15.00 | $75.00 | $0.02250 | closed |
Considering self-hosting instead? Check what your hardware can run → · Or rent a GPU by the hour →
Last updated 2026-05-02. Prices in USD per million tokens. 'In' = input/prompt; 'Out' = output/completion. Where a single price applies to both, that figure goes in both columns.