What It Does

Prompt caching is the single biggest cost-reduction opportunity in modern LLM apps, but the math is non-obvious because:

Each provider has different discounts, minimums, and surcharges
A bad cache hit rate can make caching more expensive on Anthropic (write surcharge)
The cacheable prefix has minimum sizes that vary by provider

This tool answers: “If I cache my system prompt, how much will I save?”

Paste your system prompt + a typical user message, set your volume, and see:

Monthly + annual savings
Per-request cost in three scenarios (no cache / cache hit / cache miss)
Recommendations: prefix too short? hit rate too low for the write surcharge? wrong provider for your usage pattern?

Provider Differences

Provider	Discount	Write surcharge	Min cache size	TTL
Anthropic	90% off	1.25× input on write	1024 (Haiku: 2048)	5min / 1hr
OpenAI	90% off	None	1024	~5-10min
Google	75% off	None	4096	Explicit, paid storage
DeepSeek	75% off	None	1024	Standard

The defaults pre-fill your provider’s published rates. Override any field if you have negotiated pricing.

When To Use This

Before adopting caching: confirm the savings actually beat the implementation effort
Choosing a provider: a 90% Anthropic discount with poor hit rate may net less than 75% Google with perfect caching
Architecting prompts: move persistent content (knowledge docs, examples) ABOVE the user’s per-request content to maximise cacheable prefix
Budget projections: realistic annual cost estimates for finance review

Token Estimation Method

Coarse 4-chars-per-token heuristic. Good enough for cost-projection purposes (typical error ±10%). For precise token counts before sending, use Prompt Token Counter.

Limitations

Doesn’t model Anthropic’s tiered cache hit pricing (5-minute vs 1-hour read rates differ)
Google Gemini context caching also charges hourly storage at $1/M tokens/hour — not modelled here
Output token cost isn’t modelled — caching only affects input. For full cost see LLM Cost Calculator or Reasoning Cost Calculator.

Prompt Cache Optimizer

Guide

What It Does

Provider Differences

When To Use This

Token Estimation Method

Limitations

Related Tools