$1/$5
Route to Haiku 4.5
Classification, routing, ticket enrichment, build diagnosis โ use the cheapest model. 3x cheaper than Sonnet.
โ 67% saving on classification
~90%
Prompt caching โ 1h TTL
Large system prompts cost ~90% less when cached. 1-hour TTL in 2026 keeps long agent sessions warm.
โ 90% on cached input tokens
75%
Execution-focused prompts
"List 3 issues as: ISSUE|LINE|FIX" generates 5x fewer tokens than "Please provide a comprehensive review..."
โ 60โ75% on output tokens
50%
Batch API for offline work
All non-interactive tasks (nightly summaries, bulk analysis, doc generation) โ Batch API, same models, half price.
โ 50% on async workloads
20ร
@file not @Codebase
When you know the file, use @file. @Codebase loads 5โ20 files via semantic search โ 20x more tokens than a targeted @file reference.
โ 5โ20x per targeted query
40%
Adaptive thinking, effort:low
Simple tasks at effort:low skip thinking entirely. Old budget_tokens allocated tokens even for trivial queries. Now deprecated on Sonnet 4.6+.
โ 30โ40% on thinking tokens