What It Does

Vision-capable LLMs (GPT-5, Claude 4 family, Gemini 2.5/3 Pro) charge for images as tokens. Each provider has a different formula:

OpenAI (GPT-5, GPT-5 mini, GPT-4o): resizes image to fit 2048² then short-side to 768, then tiles into 512² blocks. Each tile = 170 tokens + 85 base. “Low detail” mode = flat 85 tokens, no tiling.
Anthropic (Claude Opus/Sonnet/Haiku 4): tokens ≈ (width × height) / 750, capped at 1600 per image.
Google (Gemini 2.5/3 Pro): ≤384px in any dimension → 258 tokens flat. Larger → tiled at 768², 258 tokens per tile.

Drop an image. The tool reads dimensions client-side and computes tokens + cost for every model side-by-side.

When To Use This

OCR pipelines / document AI: high-volume images amortise differently across providers
Vision-RAG: thumbnail vs full-res tradeoff — see the cost difference
Side-by-side evals: same image, see which provider charges more for the same context
Budget projections: enter your daily image volume, get monthly/annual estimates

Privacy

The image never leaves your browser. Only the natural width and height are read via the Image() element. Even the file name isn’t sent anywhere.

Limitations

Formulas reflect each provider’s published documentation as of 2026-05. Actual billed tokens can vary ±5%.
Multi-image messages aren’t modelled — multiply costs by image count.
Video / PDF tokens use different formulas not covered here.
“Auto” detail in OpenAI varies based on image size — this tool assumes the high-detail path for the calculation.

Multi-modal Token Counter

Guide

What It Does

When To Use This

Privacy

Limitations

Related Tools