Skip to content

SLAs & Acceptance Criteria

Production SLAs

SLA Target Measurement
Control plane API availability 99.9% Uptime checks + gateway health
API read response time (p99) ≤ 10ms Histogram on cached read paths
Task scheduling latency (median) ≤ 50ms Ready detection → dispatch
Task scheduling latency (p95) ≤ 500ms End-to-end scheduling path
Task execution correctness ≥ 99% pass rate Task success / (success + failure)
Worker failure detection ≤ 30s Heartbeat stream gap
Event durability ≤ 1s Async path to durable audit log

The 10ms read SLA is the hardest constraint in the system. It is the reason the cache architecture exists. Any code path that reads from Postgres synchronously on a hot API endpoint is a bug — not a performance issue, a correctness issue.

MVP Milestone Map

Phase Capability Status
Phase 0 Prep — repo scaffolding, infra, migrations COMPLETE
Phase 1 Kernel MVP — actors, state, messaging, task graph, scheduler COMPLETE
Phase 2 Workers & Tool Runtime — execution, Docker sandbox, worker manager COMPLETE
Phase 3 Memory & LLM Routing — pgvector, LLM router, Memcached caching COMPLETE
Phase 4 Orchestration, Eval, Security — planner, goal-service, identity, access-control, approvals COMPLETE
Phase 5 Scale & Production Hardening — load tests, Grafana, alerts, runbooks, cost tracking COMPLETE
Phase 6 SDK & Applications — AgentContext, MemoryClient, ToolClient, examples COMPLETE
Phase 7 Security Compliance — gRPC/HTTP TLS, Vault integration COMPLETE
Phase 8 Platform Dashboard — embedded UI, snapshot API, auto-refresh COMPLETE
Phase 9 Agent Profile & Context — system_prompt, agent_documents, context propagation COMPLETE
Phase 10 Chat Agents — WebSocket streaming, sessions, tool invocation COMPLETE
P0-P2 Platform Stability — agent restore, dead-letter, circuit breakers, idempotency, sharding COMPLETE
Phase 11 Multi-tenancy — orgs, teams, RBAC, visibility, data isolation In progress
Phase 12 Slack integration — adapter, proactive posting, platform secrets Partial

MVP functional acceptance

Criterion Phase delivered
Spawn and run a persistent agent Phase 1
Planner produces task DAGs from a goal Phase 4
Scheduler detects ready tasks and dispatches to workers Phase 1
Worker executes tasks and returns results persisted in Postgres Phase 1/2
Task state transitions emit events to events table Phase 1
Observability traces visible for each task execution Phase 5
Tool runtime can run sandboxed command and return artifact Phase 2

Scale targets

Target Value
Concurrent agents Millions
Tasks per day 100M+
No single API call > 10ms
Worker failure detection ≤ 30s

These are design targets. Load-testing procedures live in the Astra repo.