Improve your cache hit rate
Reuse context across turns instead of re-pasting it — cached reads are ~10× cheaper.
A cache hit is when the model reuses context it has already seen this session instead of reprocessing it as fresh input. Cached reads cost roughly 10× less than fresh input tokens.
Why it matters
A low hit rate means you’re paying full price for context the model effectively already has. The usual causes are restarting sessions for a single task and re-pasting large files the agent already read.
How to improve
- Keep one session going for a task instead of restarting — restarts throw away the warm cache.
- Don’t re-paste large files. Once the agent has read a file, let the cache carry it across turns; point at it by path instead of pasting it again.
- Avoid frequent context compaction. If your context keeps getting auto-compacted, it’s too big — delegate breadth to subagents so the heavy reading stays out of your main session.