Skip to content

SLAs & Acceptance Criteria

Production SLAs

SLA Target Measurement
Control plane API availability 99.9% Uptime checks + gateway health
API read response time (p99) ≤ 10ms Histogram on cached read paths
Task scheduling latency (median) ≤ 50ms Ready detection → dispatch
Task scheduling latency (p95) ≤ 500ms End-to-end scheduling path
Task execution correctness ≥ 99% pass rate Task success / (success + failure)
Worker failure detection ≤ 30s Heartbeat stream gap
Event durability ≤ 1s Async path to durable audit log

The 10ms read SLA is the hardest constraint in the system. It is the reason the cache architecture exists. Any code path that reads from Postgres synchronously on a hot API endpoint is a bug — not a performance issue, a correctness issue.

MVP functional acceptance

Criterion Phase delivered
Spawn and run a persistent agent Phase 1
Planner produces task DAGs from a goal Phase 4
Scheduler detects ready tasks and dispatches to workers Phase 1
Worker executes tasks and returns results persisted in Postgres Phase 1/2
Task state transitions emit events to events table Phase 1
Observability traces visible for each task execution Phase 5
Tool runtime can run sandboxed command and return artifact Phase 2

Scale targets

Target Value
Concurrent agents Millions
Tasks per day 100M+
No single API call > 10ms
Worker failure detection ≤ 30s

These are design targets. Load-testing procedures live in the Astra repo.