Paste a prompt to see the exact token count and per-request cost for any model. OpenAI models use the exact OpenAI BPE encoder; Claude, Gemini and open-weights families use cl100k_base as a documented approximation (typically within ~5–10% for English). Browse the AI model pricing index for the full rate sheet, or use the side-by-side model comparison to evaluate models on benchmarks and capabilities.
AI models bill per token, and a token is roughly four characters of English text. Every request is priced in two parts: the input tokens you send (system prompt, context, and message) and the output tokens the model generates. Providers publish these as per-million-token rates — for example a model might charge a few dollars per million input tokens and more per million output tokens, since generation is the costlier side. To estimate a single request, multiply your input token count by the input rate and your output token count by the output rate, then add the two. A blended price weights those two rates into one number (typically 70% input, 30% output) for quick model-to-model ranking. To use this calculator, pick a model, paste a representative prompt or enter your expected input and output token volumes, and add your daily request count to project monthly spend. Estimating before you build matters because output length and request volume compound quickly: a workload that looks cheap per request can dominate an infrastructure budget at production scale. How the blended rate is calculated →
Pick a model, then either paste a representative prompt to count its tokens or enter your expected input and output token volumes per request. The calculator multiplies those token counts by the model's per-million-token input and output rates to give a per-request cost, then scales that across your daily request volume to project monthly spend.
Input tokens are everything you send to the model — your system prompt, context, and user message — while output tokens are everything the model generates in its completion. Providers almost always charge more per output token than per input token, so a model with cheap input but expensive output can still be costly for long, generative workloads.
A blended price collapses a model's separate input and output rates into a single figure using an assumed ratio of input to output tokens (commonly 70% input, 30% output). It is useful for quick apples-to-apples ranking, but for an accurate estimate you should price your actual input and output token volumes separately.