gekro
GitHub LinkedIn
AI

Reasoning Token Cost Calculator

Calculate the real cost of reasoning models — including the hidden reasoning tokens every other cost calculator ignores.

Reasoning Model

Quick:

Tokens per request

Cached tokens are billed separately from regular input; set 0 if you don't use prompt caching.

Reasoning effort

× output

Medium effort = ~5× visible output tokens spent reasoning.

Volume 100 req/day
Compare all reasoning models Same prompt, all providers

Cost per request with your current input/output/effort settings, sorted cheapest first. Effort multiplier uses each model's profile (DeepSeek R1 reasons more aggressively than GPT-5).

Model Per req Daily Monthly

Cost per request

$—

— vs $— without reasoning

Input $—
Output (visible) $—
Reasoning (hidden) $—

At your usage

Per day $—
Per month $—
Per year $—

Reasoning tokens contribute of the monthly bill.

Direct-API pricing as of 2026-05-13. Reasoning tokens billed at each model's output rate per provider documentation. Effort multipliers are calibrated estimates — actual reasoning usage varies by prompt complexity.

© 2026 Rohit Burani · MIT · Built at gekro.com · View source ↗

Guide

What It Does

Extended-thinking models (OpenAI o3, GPT-5 thinking, Claude 4 extended-thinking, DeepSeek R1, Grok 4 thinking, Gemini 2.5/3 Pro thinking) charge you for two kinds of output tokens:

  1. Visible output — what the model actually shows the user
  2. Reasoning tokens — hidden chain-of-thought the model “thought” before answering

Every major provider bills reasoning tokens at the model’s normal output rate. Most cost calculators (including the official ones) ignore them. The result is real bills that are 2-30× higher than the estimate.

This tool fixes that.

How To Use It

  1. Pick your reasoning model
  2. Enter typical input tokens + visible-output tokens per request
  3. Pick reasoning effort (or supply a custom multiplier)
  4. See the breakdown: input + output + hidden reasoning + cached
  5. Scale to your real request volume (10/day to 100k/day)
  6. Compare across all reasoning-capable models in one table

Why Reasoning Effort Multipliers Vary By Model

Each model reasons differently. The “medium” effort multiplier in the dropdown reflects calibrated estimates of typical reasoning-token-to-visible-output ratios:

ModelLowMediumHigh
o320×
GPT-5 thinking
Claude Opus 4 thinking12×
DeepSeek R115×30×
Grok 4 thinking12×
Gemini 2.5/3 Pro thinking

DeepSeek R1 is famously reasoning-heavy — its hidden traces can be 15-30× the visible output even at default effort. GPT-5 is the opposite extreme; its thinking is short and efficient.

What’s In Scope

  • ✅ Direct-API pricing for OpenAI, Anthropic, Google, DeepSeek, xAI
  • ✅ Reasoning-token cost calculation per provider
  • ✅ Cached input tokens (Anthropic prompt caching, OpenAI cached input, Google context caching)
  • ✅ Cost scaling: per request → daily → monthly → annual
  • ✅ Side-by-side comparison across all reasoning models

What’s Not In Scope

Where the Numbers Come From

Direct-API pricing pages from each provider, as of 2026-05. The data lives in apps/web/src/content/data/reasoning-models.json — version-controlled, every change is a git commit. An auto-fetcher pipeline is planned (provider direct-pricing pages don’t expose APIs the way hyperscaler catalogues do).

Limitations

  • Effort multipliers are calibrated estimates. Actual reasoning usage varies wildly by prompt complexity. A “medium” coding question might be 3× while a “medium” math proof is 15×.
  • Cached-input pricing assumed at provider documented rate. Anthropic’s tiered cache hit / write pricing isn’t fully modeled.
  • Direct-API only. Routing via Bedrock/Azure/Vertex changes both pricing and reasoning availability for some models.

For informational purposes only. Not financial, medical, or legal advice. You are solely responsible for how you use these tools.