What It Does

Extended-thinking models (OpenAI o3, GPT-5 thinking, Claude 4 extended-thinking, DeepSeek R1, Grok 4 thinking, Gemini 2.5/3 Pro thinking) charge you for two kinds of output tokens:

Visible output — what the model actually shows the user
Reasoning tokens — hidden chain-of-thought the model “thought” before answering

Every major provider bills reasoning tokens at the model’s normal output rate. Most cost calculators (including the official ones) ignore them. The result is real bills that are 2-30× higher than the estimate.

This tool fixes that.

How To Use It

Pick your reasoning model
Enter typical input tokens + visible-output tokens per request
Pick reasoning effort (or supply a custom multiplier)
See the breakdown: input + output + hidden reasoning + cached
Scale to your real request volume (10/day to 100k/day)
Compare across all reasoning-capable models in one table

Why Reasoning Effort Multipliers Vary By Model

Each model reasons differently. The “medium” effort multiplier in the dropdown reflects calibrated estimates of typical reasoning-token-to-visible-output ratios:

Model	Low	Medium	High
o3	3×	8×	20×
GPT-5 thinking	1×	3×	8×
Claude Opus 4 thinking	2×	5×	12×
DeepSeek R1	5×	15×	30×
Grok 4 thinking	2×	5×	12×
Gemini 2.5/3 Pro thinking	1×	3×	8×

DeepSeek R1 is famously reasoning-heavy — its hidden traces can be 15-30× the visible output even at default effort. GPT-5 is the opposite extreme; its thinking is short and efficient.

What’s In Scope

✅ Direct-API pricing for OpenAI, Anthropic, Google, DeepSeek, xAI
✅ Reasoning-token cost calculation per provider
✅ Cached input tokens (Anthropic prompt caching, OpenAI cached input, Google context caching)
✅ Cost scaling: per request → daily → monthly → annual
✅ Side-by-side comparison across all reasoning models

What’s Not In Scope

❌ Hyperscaler routing (Bedrock / Azure / Vertex) — that’s what App #51 Hyperscaler Pricing Comparison covers
❌ Fine-tuning, embeddings, image/audio generation — different cost dimensions
❌ Non-reasoning models — for those use LLM Cost Calculator

Where the Numbers Come From

Direct-API pricing pages from each provider, as of 2026-05. The data lives in apps/web/src/content/data/reasoning-models.json — version-controlled, every change is a git commit. An auto-fetcher pipeline is planned (provider direct-pricing pages don’t expose APIs the way hyperscaler catalogues do).

Hyperscaler Pricing Comparison — Bedrock vs Foundry vs Vertex for the same models
LLM Cost Calculator — non-reasoning model costs
Prompt Token Counter — tokenize before you estimate
Context Window Visualizer — see where context is going

Limitations

Effort multipliers are calibrated estimates. Actual reasoning usage varies wildly by prompt complexity. A “medium” coding question might be 3× while a “medium” math proof is 15×.
Cached-input pricing assumed at provider documented rate. Anthropic’s tiered cache hit / write pricing isn’t fully modeled.
Direct-API only. Routing via Bedrock/Azure/Vertex changes both pricing and reasoning availability for some models.

Reasoning Token Cost Calculator

Guide