Astra¶

Astra is an operating system for autonomous agents: a microkernel (actors, task graph, scheduler, messaging, state), sixteen microservices, sandboxed tools, layered memory, LLM routing, real-time chat, and platform stability infrastructure — not a single-model chat wrapper or a UI product.

Executive summary: Persistent agents submit goals; the planner materialises DAGs of tasks; the scheduler dispatches work via Redis Streams; workers run tasks inside sandboxes; Postgres is the source of truth and Redis/Memcached enforce the ≤10ms read path. Everything external talks through JWT; services talk over mTLS; dangerous work passes policy and approval gates. Agents can have profiles (system prompts, attached documents) that propagate through planning to execution, and users can interact via WebSocket chat or Slack.

flowchart LR
  U[Client] --> GW[API gateway]
  GW --> GS[goal-service]
  GS --> PL[planner]
  PL --> TS[task-service]
  TS --> SCH[scheduler]
  SCH --> R[(Redis)]
  R --> EW[execution-worker]
  EW --> TR[tool-runtime]

Goals¶

Goal	Target
Scale	Millions of agents, 100M+ tasks/day (PRD §1).
Latency	No hot-path API read over 10ms p99; scheduling median ≤50ms, P95 ≤500ms (PRD §25).
Safety	Sandboxed tools, RBAC, approvals, secrets in Vault, mTLS everywhere between services.
Operability	Metrics, traces, runbooks, rolling upgrades with backward-compatible schema.
Resilience	Agent restore on startup, dead-letter tasks, circuit breakers, consumer retry, goal idempotency (PRD §21, P0-P2).

Core capabilities (PRD v3.0)¶

Platform stability (P0-P2): Agent restore on startup, task dead-letter queue, Redis consumer retry/reclaim, readiness vs liveness probes, gateway circuit breakers, goal idempotency, configurable task-stream sharding, supervisor wiring, mailbox-full handling.
Agent profile & context (Phase 9): System prompts, attached documents (rules, skills, context docs, references), context propagation through planning and execution pipeline.
Real-time chat agents (Phase 10): WebSocket streaming with tool invocation, session management, message injection.
Slack integration (Phase 12): Connect chat agents to Slack workspaces, proactive posting, platform-configurable Slack app secrets.
Hardware acceleration: Metal/Neural Engine on macOS, CUDA on Linux, graceful CPU fallback. macOS is a supported production target.
Olympus application layer: External agent adapter framework, webhook ingest, goal-level dependencies, agent-to-agent goal posting, dual-approval, trust scores.

Non-goals¶

Not building foundation models — Astra integrates providers.
Not replacing every data platform — it composes Postgres, Redis, object storage, etc.
Not embedding app logic in the kernel — strict kernel/SDK/app boundary (PRD §2).
Not only a chat product — chat is one surface on the gateway; core is the task graph and actor runtime.

Who should read what¶

Role	Start with
New contributor	Getting started → Glossary → Architecture overview
Backend engineer	Kernel, Task graph, Services
Operator	Operations, Deployment, runbooks
Security reviewer	Security, PRD §18

Reading order¶

This page → Glossary if terms are unfamiliar.
Architecture overview — layers and goal→task flow.
Services — all sixteen services.
Reference — contracts, schema, Redis, APIs.
Operations when you’re on call.

At a glance¶

Dimension	Value
Language	Go (primary), Python tooling
Kernel	Microkernel + actor runtime
Tasks	Distributed DAG, transactional state
Bus	Redis Streams + consumer groups
Source of truth	Postgres (+ pgvector)
Hot cache	Redis, Memcached
Sandboxing	WASM / Docker / Firecracker
Services	16 canonical microservices
Spec	PRD in the Astra repo

Maturity

Astra tracks Engineering Specification v3.0 (PRD). Phases 0–10 are complete; Phase 11 (multi-tenancy) is in progress; Phase 12 (Slack) is partial. When in doubt, read the PRD section cited on each page.

Sections¶

Getting Started

Repo layout, prerequisites, local paths.

Getting started
Glossary

Terms and acronyms.

Glossary
Security

mTLS, JWT, sandbox, Vault, approvals.

Security
Architecture

Kernel, actors, scheduler, memory, LLM routing.

Architecture
Reference

gRPC, schema, Redis, APIs, metrics, SLAs.

Reference
Operations

Runbooks and incident flow.

Operations
Deployment

K8s, local, macOS, GCP.

Deployment