Prompt Caching - Claude Code
Automatic reuse of cached context for substantial cost reduction.
Prompt caching reuses large, stable parts of your prompt across turns so you don't pay to re-tokenize them every time.
What it does
Claude Code marks static context - system prompts, CLAUDE.md, loaded files - as cacheable. Subsequent turns that reuse the same prefix pay a fraction of the normal per-token cost. This is why long sessions don't cost linearly more per turn as context grows.
When to use it
- Any session with meaningful CLAUDE.md or rule files - caching is already on by default.
- Heavy repos where large file reads recur turn after turn.
- Long debugging sessions where you want predictable costs.
- API-integrated workflows where per-turn cost matters.
Gotchas
- Cache hits require the prefix to be byte-identical. Small CLAUDE.md edits invalidate the cache.
- Cached entries expire - very long gaps between turns pay full price again.
- Caching is configured per model. Check the model config doc if your numbers look off.
Official docs: https://code.claude.com/docs/en/model-config.md#prompt-caching-configuration
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
Was this helpful?




