Paste any OpenAI, Anthropic, or Gemini API response and extract content, tokens, and cost
Invalid JSON — check for truncated responses or extra characters outside the root object.
Content
Thinking content (extended thinking)
Tool calls
Stop reason ?
—
Model
—
Created / ID
—
System fingerprint
—
Token usage ?
Input
—
Output
—
Total
—
Cached
—
Cost estimate
This request
$0.0000
Input: $0.0000
Output: $0.0000
Model: — · Rates: $0.00 in / $0.00 out per 1M tokens
Model not recognized. Select the model family to estimate cost:
Estimated cost
$0.0000
Input: $0.0000
Output: $0.0000
Paste any raw JSON API response from OpenAI, Anthropic Claude, or Google Gemini and this tool will:
Everything runs client-side. Your response data is never sent anywhere.
gpt-4o-2024-08-06, claude-3-7-sonnet-20250219), the cost estimate appears automatically.The three major LLM providers use meaningfully different JSON schemas. Here is what each looks like:
OpenAI’s chat completion response wraps output in a choices array. Each choice has a message (for non-streaming) or delta (for streaming chunks). The main content lives at choices[0].message.content.
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1714000000,
"model": "gpt-4o-2024-08-06",
"system_fingerprint": "fp_abc123",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "The answer is 42.",
"tool_calls": null
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 18,
"completion_tokens": 7,
"total_tokens": 25,
"prompt_tokens_details": { "cached_tokens": 0 }
}
}
Tool calls appear as choices[0].message.tool_calls — an array of objects with function.name and function.arguments (a JSON string).
Anthropic’s Messages API uses a content array at the top level. Each block has a type: "text" for main content, "thinking" for extended thinking blocks, "tool_use" for function calls. This flat content array is more extensible than OpenAI’s single-field design.
{
"id": "msg_01XyZ",
"type": "message",
"role": "assistant",
"model": "claude-3-7-sonnet-20250219",
"content": [
{
"type": "thinking",
"thinking": "Let me work through this step by step..."
},
{
"type": "text",
"text": "The answer is 42."
}
],
"stop_reason": "end_turn",
"usage": {
"input_tokens": 18,
"output_tokens": 7,
"cache_creation_input_tokens": 0,
"cache_read_input_tokens": 0
}
}
Tool calls appear as content blocks with "type": "tool_use", containing name and input (an object, not a JSON string).
Google’s Gemini API uses candidates at the top level. Each candidate has a content.parts array where text lives in parts[0].text. Function calls appear as parts with a functionCall key instead of text. Usage metadata is a separate top-level field.
{
"candidates": [{
"content": {
"parts": [{ "text": "The answer is 42." }],
"role": "model"
},
"finishReason": "STOP",
"index": 0
}],
"usageMetadata": {
"promptTokenCount": 18,
"candidatesTokenCount": 7,
"totalTokenCount": 25
},
"modelVersion": "gemini-2.0-flash-001"
}
Debugging agent pipelines. When an agent makes an unexpected decision, the first step is reading the raw API response. Was it a tool call gone wrong? Did it hit max_tokens? Was content filtered? This tool answers those questions in seconds without needing jq, Python, or opening a console.
Understanding token usage. Token counts drive cost and latency. Developers often build intuitions like “a typical request costs N tokens” without ever measuring. Pasting real responses here builds calibration quickly.
Catching streaming vs. non-streaming issues. Streaming responses ("object": "chat.completion.chunk" for OpenAI) have different schemas than non-streaming ones. If you accidentally paste a streaming chunk instead of a complete response, the parser will flag the content as empty — which tells you something went wrong in your client code.
Comparing providers on the same workload. Run the same prompt through OpenAI and Anthropic, paste both responses here, and see the token counts and costs side by side. The schema differences become obvious and the relative cost/quality tradeoffs surface immediately.
Every provider signals why generation ended via a stop reason field. The field name and values differ by provider:
OpenAI finish_reason | Anthropic stop_reason | Gemini finishReason | Meaning |
|---|---|---|---|
stop | end_turn | STOP | Model finished naturally — response is complete |
length | max_tokens | MAX_TOKENS | Hit the token limit — response may be truncated |
tool_calls | tool_use | (see function_call part) | Model wants to call a function |
content_filter | (varies) | SAFETY | Output blocked by content policy |
null | stop_sequence | — | Matched a user-specified stop sequence |
length / max_tokens is the most dangerous stop reason in production. It means the model was mid-sentence when it hit your token ceiling. If your application treats the response as complete in this case, you will silently serve truncated output to users. Always handle this explicitly in your code.
tool_calls / tool_use is not an error — it means the model is requesting a tool call. Your agent loop should detect this, execute the tool, and send the result back as a follow-up message.
Cost is computed as:
cost = (input_tokens / 1,000,000) × price_per_million_input
+ (output_tokens / 1,000,000) × price_per_million_output
Pricing data is hardcoded in the tool and updated manually. The table covers OpenAI (o3, o4-mini, GPT-4o, GPT-4o mini, GPT-4 Turbo, GPT-3.5 Turbo), Anthropic (Claude Opus 4, Claude 3.7 Sonnet, Claude 3.5 Sonnet/Haiku, Claude 3 family), Google (Gemini 2.5 Pro, 2.0 Flash, 1.5 Pro/Flash), DeepSeek (R1, V3), and Mistral (Large, Small, Mixtral 8x22B).
Cached token discounts are not applied automatically. If a provider offers prompt caching at a discounted rate (Anthropic’s cache read tokens are billed at ~10% of standard input rates), the displayed cost uses standard rates for all input tokens. The cached token count is shown separately so you can estimate the discount manually.
Batch API discounts are not applied. OpenAI and Anthropic both offer ~50% discounts for async batch requests. If your response came from a batch job, the displayed cost is approximately 2× the actual bill.
Self-hosted models show no cost. Open-weight models (Llama, Mistral self-hosted) are excluded from the pricing table because cost depends entirely on your hardware, electricity, and amortization assumptions. Use the LLM Cost Calculator for that.
ft:gpt-4o:org:custom:abc123) or provider-specific aliases.result.message object before pasting.For informational purposes only. Not financial, medical, or legal advice. You are solely responsible for how you use these tools.