gekro
GitHub LinkedIn
AI

Prompt Cache Optimizer

Analyze your prompt → find the optimal cache split point → see projected monthly savings across Anthropic / OpenAI / Google / DeepSeek caching.

Provider

Pricing Standard rates

Anthropic charges 1.25× input rate on first write to a cache block. OpenAI: free. Set 1.0 to disable the surcharge.

System prompt + persistent context

— tokens

Per-request user content

— tokens

Expected traffic 1000 req/day

Cache hit rate = fraction of requests that find the prefix already cached. 85% is a reasonable assumption for high-volume agents with consistent system prompts; 50% is more realistic for occasional traffic where cache entries TTL out.

Monthly savings

$—

—% reduction vs no caching

Without caching$—
With caching$—
Daily savings$—
Annual savings$—

Per request

System tokens (cached)
User tokens (uncached)
No-cache cost$—
Cache-hit cost$—
Cache-miss cost (write)$—

Recommendations

  • Fill in your prompt to see recommendations.

🔒 Prompts analysed in your browser. Nothing transmitted.

As-is, no warranty. These apps are free under their listed license and run entirely in your browser. Use at your own risk — don't blame me if your PC catches fire, your dog runs away, or the math turns out wrong. Verify anything that actually matters. None of this is professional financial, medical, legal, or engineering advice.

© 2026 Rohit Burani · MIT · Built at gekro.com · View source ↗

Guide

What It Does

Prompt caching is the single biggest cost-reduction opportunity in modern LLM apps, but the math is non-obvious because:

  • Each provider has different discounts, minimums, and surcharges
  • A bad cache hit rate can make caching more expensive on Anthropic (write surcharge)
  • The cacheable prefix has minimum sizes that vary by provider

This tool answers: “If I cache my system prompt, how much will I save?”

Paste your system prompt + a typical user message, set your volume, and see:

  • Monthly + annual savings
  • Per-request cost in three scenarios (no cache / cache hit / cache miss)
  • Recommendations: prefix too short? hit rate too low for the write surcharge? wrong provider for your usage pattern?

Provider Differences

ProviderDiscountWrite surchargeMin cache sizeTTL
Anthropic90% off1.25× input on write1024 (Haiku: 2048)5min / 1hr
OpenAI90% offNone1024~5-10min
Google75% offNone4096Explicit, paid storage
DeepSeek75% offNone1024Standard

The defaults pre-fill your provider’s published rates. Override any field if you have negotiated pricing.

When To Use This

  • Before adopting caching: confirm the savings actually beat the implementation effort
  • Choosing a provider: a 90% Anthropic discount with poor hit rate may net less than 75% Google with perfect caching
  • Architecting prompts: move persistent content (knowledge docs, examples) ABOVE the user’s per-request content to maximise cacheable prefix
  • Budget projections: realistic annual cost estimates for finance review

Token Estimation Method

Coarse 4-chars-per-token heuristic. Good enough for cost-projection purposes (typical error ±10%). For precise token counts before sending, use Prompt Token Counter.

Limitations

  • Doesn’t model Anthropic’s tiered cache hit pricing (5-minute vs 1-hour read rates differ)
  • Google Gemini context caching also charges hourly storage at $1/M tokens/hour — not modelled here
  • Output token cost isn’t modelled — caching only affects input. For full cost see LLM Cost Calculator or Reasoning Cost Calculator.

For informational purposes only. Not financial, medical, or legal advice. You are solely responsible for how you use these tools.