How much does prompt caching save on Anthropic Claude?

Anthropic prompt caching gives a 90% discount on cached input tokens (cache reads). Writing to the cache costs 25% more than the base input rate, but on workloads with reusable system prompts or context, total cost typically drops 50 to 80%.

How much does batch mode save?

Anthropic and OpenAI both offer batch APIs that return results within 24 hours at a 50% discount on both input and output tokens. Use batch mode for evaluation, backfills, summarization jobs, and other non-interactive workloads.

Which AI model is cheapest in 2026?

Claude Haiku 4.5 and Gemini 2.5 Flash are the cheapest frontier-class models in 2026, both around $0.25 per million input tokens. GPT-5.3 mini and Claude Haiku trade blows on price for high-volume workloads.

Are these prices current?

Pricing reflects published rates as of April 29, 2026. Always verify against the provider pricing pages for the most current numbers, especially before signing a large commitment.

AI API Cost Calculator

Estimate cost per request, day, and month across Claude, GPT-5, Gemini, and friends. Toggle prompt caching and batch mode to see real-world savings. Pricing snapshot: 2026-04-29.

Workload

Input tokens / requestOutput tokens / requestRequests / day

Model A

Anthropic

Model

Prompt caching (90% off cached reads)Batch mode (50% off both legs, 24h SLA)

Effective input$3.00/1M

Effective output$15.00/1M

Per request$0.0135

Per day$13.50

Per month$405.00

Model B

OpenAI

Model

Prompt caching (90% off cached reads)Batch mode (50% off both legs, 24h SLA)

Effective input$5.00/1M

Effective output$20.00/1M

Per request$0.0200

Per day$20.00

Per month$600.00

Model A is $195.00/mo cheaper

Weekly cost-optimization playbooks.

Get the techniques we use to cut AI bills 50 to 80% - prompt caching patterns, batch pipelines, model routing, and eval-driven downsizing. Free, no fluff.

Get Smarter About AI Dev

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.

One email per weekReal code, not theoryFree forever

Cut your bill further

Blog

Prompt caching in production with the Claude API

90% off cached reads, when to invalidate, and the gotchas that bite at scale.

Blog

Tool use patterns that keep token spend low

How to structure tool calls so you do not pay for re-sent context every turn.

Blog

Cost tape: line-item observability for AI calls

Wire spend per route, per user, per skill so you can find the leak.

YouTube

Cost-optimization videos on Developers Digest

Walkthroughs: caching, batch APIs, model cascades, and routing.

FAQ

How is AI API cost calculated?+

(input tokens / 1M) * input price + (output tokens / 1M) * output price, multiplied by requests per day. Output tokens are typically 3 to 5x more expensive than input.

How much does prompt caching save?+

Anthropic gives a 90% discount on cached input tokens. Cache writes cost 25% more than base input, so caching pays off when context is reused enough times. On stable system prompts you can see 50 to 80% total savings.

What is batch mode?+

An async API that returns results within 24 hours at 50% off both input and output. Ideal for evals, backfills, summarization, and offline pipelines.

Are prices current?+

Snapshot from 2026-04-29. Verify against provider pricing pages before committing to a contract.

Pricing as of 2026-04-29. Always verify against provider pricing pages.

Weekly cost-optimization playbooks.

Get the techniques we use to cut AI bills 50 to 80% - prompt caching patterns, batch pipelines, model routing, and eval-driven downsizing. Free, no fluff.

Get Smarter About AI Dev

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.

One email per weekReal code, not theoryFree forever

FAQ

How is AI API cost calculated?+

(input tokens / 1M) * input price + (output tokens / 1M) * output price, multiplied by requests per day. Output tokens are typically 3 to 5x more expensive than input.

How much does prompt caching save?+

What is batch mode?+

An async API that returns results within 24 hours at 50% off both input and output. Ideal for evals, backfills, summarization, and offline pipelines.

Are prices current?+

Snapshot from 2026-04-29. Verify against provider pricing pages before committing to a contract.