Metrics¶
Observability follows RED (rate, errors, duration) and USE (utilization, saturation, errors) patterns. Prometheus-style metrics cover tasks, workers, LLM usage, scheduling depth, and actors. Exact metric names, dashboard JSON, and alert rule files are internal to the Astra repo (PRD §17).
Themes¶
| Theme | Examples |
|---|---|
| Task path | Latency, success/failure counts |
| Workers | Heartbeats, availability |
| Scheduler | Queue depth per shard |
| LLM | Tokens, cost |
| Inference | Active backend (CPU vs accelerated) |
Tracing¶
Distributed traces link requests, task runs, and tool calls via shared correlation IDs. Stack details: PRD §17.
Dashboards¶
Platform dashboard (gateway-hosted) and Grafana views summarise health, cost, and throughput — see PRD for surfaces.