Claude Fable 5$22.000/MClaude Opus 4.8$11.000/MClaude Opus 4.7$11.000/MClaude Opus 4.6$11.000/MClaude Opus 4.5$33.000/MClaude Sonnet 3.7$6.600/MClaude Opus 3$33.000/MClaude 2.1$12.800/MClaude 2$12.800/MGPT-5.5$12.500/MGPT-5.2$5.425/MGPT-5.2-Codex$5.425/MGPT-5$3.875/MGPT-4.5$97.500/MGPT-4 Turbo Preview$16.000/MGPT-4$39.000/MGPT-4-32k$78.000/Mo3$19.000/Mo3-mini$2.090/Mo4-mini$2.090/Mo1$28.500/Mo1-mini$5.700/Mo1-preview$28.500/MGemini 3.5 Pro$5.000/MGemini 3.1 Pro$5.000/MGemini 3 Pro$5.000/MGemini 2.5 Pro$3.875/MGemini 1.5 Pro$2.375/MGemini 1.0 Ultra$12.000/MGemini 1.0 Pro$0.800/MClaude Fable 5$22.000/MClaude Opus 4.8$11.000/MClaude Opus 4.7$11.000/MClaude Opus 4.6$11.000/MClaude Opus 4.5$33.000/MClaude Sonnet 3.7$6.600/MClaude Opus 3$33.000/MClaude 2.1$12.800/MClaude 2$12.800/MGPT-5.5$12.500/MGPT-5.2$5.425/MGPT-5.2-Codex$5.425/MGPT-5$3.875/MGPT-4.5$97.500/MGPT-4 Turbo Preview$16.000/MGPT-4$39.000/MGPT-4-32k$78.000/Mo3$19.000/Mo3-mini$2.090/Mo4-mini$2.090/Mo1$28.500/Mo1-mini$5.700/Mo1-preview$28.500/MGemini 3.5 Pro$5.000/MGemini 3.1 Pro$5.000/MGemini 3 Pro$5.000/MGemini 2.5 Pro$3.875/MGemini 1.5 Pro$2.375/MGemini 1.0 Ultra$12.000/MGemini 1.0 Pro$0.800/M

Index · Cost Calculator

AI Cost Calculator

565 models priced

Paste a prompt to see the exact token count and per-request cost for any model. OpenAI models use the exact OpenAI BPE encoder; Claude, Gemini and open-weights families use cl100k_base as a documented approximation (typically within ~5–10% for English). Browse the AI model pricing index for the full rate sheet, or use the side-by-side model comparison to evaluate models on benchmarks and capabilities.

Model

$0.800/M input$3.200/M output1M context◇ approximate · cl100k_base

INPUT (your prompt)

0 tokens · 726 chars

Sample: Article summary (RAG)$0.00 per request

OUTPUT (model completion)

0 tokens · 405 chars

$0.00 per request

PER REQUEST

$0.00

0 in · 0 out

PER 1,000 REQUESTS

$0.00

at the same input + output size

PER 100K REQUESTS

$0.00

linear scaling

PER 1M REQUESTS

$0.00

at production scale

How AI cost estimation works

AI models bill per token, and a token is roughly four characters of English text. Every request is priced in two parts: the input tokens you send (system prompt, context, and message) and the output tokens the model generates. Providers publish these as per-million-token rates — for example a model might charge a few dollars per million input tokens and more per million output tokens, since generation is the costlier side. To estimate a single request, multiply your input token count by the input rate and your output token count by the output rate, then add the two. A blended price weights those two rates into one number (typically 70% input, 30% output) for quick model-to-model ranking. To use this calculator, pick a model, paste a representative prompt or enter your expected input and output token volumes, and add your daily request count to project monthly spend. Estimating before you build matters because output length and request volume compound quickly: a workload that looks cheap per request can dominate an infrastructure budget at production scale. How the blended rate is calculated →

Frequently Asked Questions

How do I estimate AI API costs?

Pick a model, then either paste a representative prompt to count its tokens or enter your expected input and output token volumes per request. The calculator multiplies those token counts by the model's per-million-token input and output rates to give a per-request cost, then scales that across your daily request volume to project monthly spend.

What is the difference between input and output token pricing?

Input tokens are everything you send to the model — your system prompt, context, and user message — while output tokens are everything the model generates in its completion. Providers almost always charge more per output token than per input token, so a model with cheap input but expensive output can still be costly for long, generative workloads.

What is a blended token price?

A blended price collapses a model's separate input and output rates into a single figure using an assumed ratio of input to output tokens (commonly 70% input, 30% output). It is useful for quick apples-to-apples ranking, but for an accurate estimate you should price your actual input and output token volumes separately.