Estimate token count and API cost for any prompt across all major models
Estimate uses 4 chars ≈ 1 token (±10% for English). Actual count varies by tokenizer and content type.
Large input detected — markdown analysis skipped to keep the UI responsive. Token estimate is still accurate.
Estimated tokens
0
Characters
0
Markdown chars
0
Paragraphs
0
Method
chars ÷ 4
Markdown structure
Input cost per model (single request)
| Model | Input cost | Context used? | Rate / 1M tok? |
|---|---|---|---|
| Claude Opus 4.7 Anthropic | $0.000000 | | $5.00/M |
| Claude Sonnet 4.6 Anthropic | $0.000000 | | $3.00/M |
| Claude Haiku 4.5 Anthropic | $0.000000 | | $1.00/M |
| GPT-5.4 OpenAI | $0.000000 | | $2.50/M |
| GPT-5.4 mini OpenAI | $0.000000 | | $0.75/M |
| Gemini 2.5 Pro Google | $0.000000 | | $1.25/M |
| Gemini 2.5 Flash Google | $0.000000 | | $0.30/M |
| GPT-4o OpenAI | $0.000000 | | $2.50/M |
| GPT-4o mini OpenAI | $0.000000 | | $0.15/M |
| DeepSeek V3 DeepSeek | $0.000000 | | $0.27/M |
| DeepSeek R1 DeepSeek | $0.000000 | | $0.55/M |
Prices from official provider pages. Last verified: 2026-04-19.
Cost at scale (this prompt, per model cheapest)
The Prompt Token Counter gives you an instant estimate of how many tokens your prompt contains and what it costs to send as input across Claude, GPT, and Gemini model families. It also shows context window utilization — what percentage of each model’s maximum context your prompt already occupies.
This is a pre-flight tool for AI engineering. Before scripting a large-context call, running a batch job, or finalizing a system prompt for production, pasting it here tells you two things: whether it fits, and what it costs. Both matter more than most people realize until they hit a context limit at runtime or find an unexpected charge on their API bill.
For system prompts specifically: measure the system prompt alone first, then measure a typical user turn. The sum is your per-call input token floor.
The 4-characters-per-token approximation is the industry standard for quick estimation. Most tokenizers used by major LLM providers (BPE variants, cl100k, the Gemini SentencePiece variants) average close to 4 English characters per token for prose text. It’s not exact — it’s a heuristic that’s accurate enough for planning.
Accuracy varies by content type:
| Content type | Chars/token | Accuracy |
|---|---|---|
| English prose | ~4 | ±10% |
| Code (Python, JS) | ~3–3.5 | ±15% |
| JSON / structured data | ~3–4 | ±15% |
| Non-Latin scripts (Chinese, Japanese) | ~1.5–2 | ±25% |
| Whitespace-heavy content | ~5–6 | ±20% |
Code tokenizes more cheaply (fewer chars per token) because identifiers, keywords, and operators each map to their own tokens. Non-Latin scripts tokenize more expensively — a single Chinese character often occupies 1.5–2 tokens, which can make multilingual prompts substantially more expensive than the character count suggests.
For exact counts: Anthropic tokenizer and OpenAI tiktoken are the authoritative sources.
Token count governs both cost and quality, and most people underestimate both risks.
The cost angle is obvious but often mistracked. A system prompt that’s 6,000 tokens, sent with 1,000 user tokens per call, means 7,000 input tokens per API call. At 10,000 calls/day, that’s 70M input tokens daily. The difference between Claude Haiku and Claude Sonnet pricing at that volume can be $500–$2,000/month — and trimming 30% of boilerplate from the system prompt saves proportionally.
The quality angle is less obvious. Models degrade as context fills up. The phenomenon — sometimes called “lost in the middle” — is well-documented: information in the middle of a long context window is retrieved less reliably than information at the start or end. A prompt that’s using 80% of a model’s context window is not getting 80% of peak performance. For RAG pipelines that stuff retrieved chunks into context, this matters a lot. Measuring utilization before production deployment is a basic quality check, not an optimization.
System prompt optimization is the highest-leverage use. A well-written system prompt might start at 3,000 tokens of instructions. After a few rounds of tightening — removing redundant phrasing, collapsing examples, eliminating defensive hedging — it might reach 1,800 tokens. Paste both versions here and see the cost difference across your projected call volume.
Pairs well with the LLM Cost Calculator for projecting full monthly spend once you know your token volumes.
The 4-characters-per-token rule is a reasonable approximation for English prose. Accuracy varies by content type:
| Content type | Chars/token | Accuracy |
|---|---|---|
| English prose | ~4 | ±10% |
| Code (Python, JS) | ~3–3.5 | ±15% |
| JSON / structured data | ~3–4 | ±15% |
| Non-Latin scripts (Chinese, Japanese) | ~1.5–2 | ±25% |
| Whitespace-heavy content | ~5–6 | ±20% |
For exact counts, use the provider’s official tokenizer: Anthropic tokenizer, OpenAI tiktoken.
For informational purposes only. Not financial, medical, or legal advice. You are solely responsible for how you use these tools.