Skip to content

Metrics

Observability follows RED (rate, errors, duration) and USE (utilization, saturation, errors) patterns. Prometheus-style metrics cover tasks, workers, LLM usage, scheduling depth, and actors. Exact metric names, dashboard JSON, and alert rule files are internal to the Astra repo (PRD §17).

Themes

Theme Examples
Task path Latency, success/failure counts
Workers Heartbeats, availability
Scheduler Queue depth per shard
LLM Tokens, cost
Inference Active backend (CPU vs accelerated)

Tracing

Distributed traces link requests, task runs, and tool calls via shared correlation IDs. Stack details: PRD §17.

Dashboards

Platform dashboard (gateway-hosted) and Grafana views summarise health, cost, and throughput — see PRD for surfaces.