Analyze your prompt → find the optimal cache split point → see projected monthly savings across Anthropic / OpenAI / Google / DeepSeek caching.
Provider
Anthropic charges 1.25× input rate on first write to a cache block. OpenAI: free. Set 1.0 to disable the surcharge.
System prompt + persistent context
— tokens
Per-request user content
— tokens
Cache hit rate = fraction of requests that find the prefix already cached. 85% is a reasonable assumption for high-volume agents with consistent system prompts; 50% is more realistic for occasional traffic where cache entries TTL out.
Monthly savings
$—
—% reduction vs no caching
Per request
Recommendations
As-is, no warranty. These apps are free under their listed license and run entirely in your browser. Use at your own risk — don't blame me if your PC catches fire, your dog runs away, or the math turns out wrong. Verify anything that actually matters. None of this is professional financial, medical, legal, or engineering advice.
Prompt caching is the single biggest cost-reduction opportunity in modern LLM apps, but the math is non-obvious because:
This tool answers: “If I cache my system prompt, how much will I save?”
Paste your system prompt + a typical user message, set your volume, and see:
| Provider | Discount | Write surcharge | Min cache size | TTL |
|---|---|---|---|---|
| Anthropic | 90% off | 1.25× input on write | 1024 (Haiku: 2048) | 5min / 1hr |
| OpenAI | 90% off | None | 1024 | ~5-10min |
| 75% off | None | 4096 | Explicit, paid storage | |
| DeepSeek | 75% off | None | 1024 | Standard |
The defaults pre-fill your provider’s published rates. Override any field if you have negotiated pricing.
Coarse 4-chars-per-token heuristic. Good enough for cost-projection purposes (typical error ±10%). For precise token counts before sending, use Prompt Token Counter.
For informational purposes only. Not financial, medical, or legal advice. You are solely responsible for how you use these tools.