Skip to content

Runbook — LLM cost spike¶

Trigger: LLM cost or token rate far above baseline (dashboard / alert).

Symptoms¶

Sudden jump in premium model usage or token volume.
Specific agents or workloads dominating spend.

What operators do (summary)¶

Identify whether traffic is legitimate (launch, batch job) or abuse / bug.
Throttle: apply tier downgrade, concurrency limits, or per-agent quotas per internal ops guide (env / config changes not listed here).
Cache: confirm response cache is healthy so repeat prompts aren’t re-billed.
Verify spend returns to baseline; escalate product if policy change is needed.

Note

Exact env vars and kubectl commands are internal-only.