Runbook — Redis failure¶
Trigger: Redis unreachable or read-only; streams or caches fail.
Impact (conceptual)¶
Dispatch and worker coordination degrade; cached reads may error or miss. Durable task rows in Postgres are not wiped by Redis loss alone, but in-flight dispatch and locks are affected.
What operators do (summary)¶
- Scope the outage (single node vs cluster).
- Fail over or restore Redis per provider / platform runbook.
- Restart schedulers and workers in a safe order after Redis is healthy.
- Watch queue depth and cache stampede on recovery.
- Escalate to platform SRE for SEV1 scope.
Note
Key names, stream names, and CLI commands are omitted here; use private documentation.
See Redis keys reference at a conceptual level only.