coaching

Improve your cache hit rate

Reuse context across turns instead of re-pasting it — cached reads are ~10× cheaper.

A cache hit is when the model reuses context it has already seen this session instead of reprocessing it as fresh input. Cached reads cost roughly 10× less than fresh input tokens.

Why it matters

A low hit rate means you’re paying full price for context the model effectively already has. The usual causes are restarting sessions for a single task and re-pasting large files the agent already read.

How to improve

  • Keep one session going for a task instead of restarting — restarts throw away the warm cache.
  • Don’t re-paste large files. Once the agent has read a file, let the cache carry it across turns; point at it by path instead of pasting it again.
  • Avoid frequent context compaction. If your context keeps getting auto-compacted, it’s too big — delegate breadth to subagents so the heavy reading stays out of your main session.