Runbook — LLM cost spike¶
Trigger: LLM cost or token rate far above baseline (dashboard / alert).
Symptoms¶
- Sudden jump in premium model usage or token volume.
- Specific agents or workloads dominating spend.
What operators do (summary)¶
- Identify whether traffic is legitimate (launch, batch job) or abuse / bug.
- Throttle: apply tier downgrade, concurrency limits, or per-agent quotas per internal ops guide (env / config changes not listed here).
- Cache: confirm response cache is healthy so repeat prompts aren’t re-billed.
- Verify spend returns to baseline; escalate product if policy change is needed.
Note
Exact env vars and kubectl commands are internal-only.