ORCHESTRATION-SPEC — Multi-Agent Orchestration Specification¶
Status: live · Version: 1.0.0 · Camada: 9 (AI Infrastructure) Data source:
docs/reference/registries/_orchestrators-registry.yaml(12 orchestrators) Sibling:docs/reference/templates/AGENT-MANIFEST.md(per-agent deep spec)
Purpose¶
Authoritative specification for how requests flow through the 5-layer orchestration hierarchy. Where the registry is the lean index, this spec is the prose explaining WHY the hierarchy exists and HOW transitions happen.
Layer responsibilities¶
Layer 1 — Cortex (global orchestrator)¶
- One instance, always-on
- Reads: ROUTING-COMPASS, CONSTITUTION, COST-POLICY
- Writes: capsules to Layer 2
- Never invokes a skill directly
- Failure mode: cortex_fallback (a simpler classifier)
Layer 2 — Domain orchestrators (10)¶
- frontend_orch, backend_orch, ai_ml_orch, devops_orch, security_orch, qa_orch, finance_trading_orch, integrations_orch, iot_orch, meta_orch
- Always-on; one per coherent macro-domain
- Reads: domain-specific skills + ADRs
- Writes: capsules to Layer 3 (task orchestrators)
- Veto power: security_orch and qa_orch can block releases unilaterally
Layer 3 — Task orchestrators (ephemeral)¶
- Spawned per-story, reaped on completion
- TTL: 1800s max (enforced)
- Reads: parent capsule + memory pointers
- Writes: capsules to Layer 4 (specialists)
- Emits: checkpoints per the parent capsule's
checkpoint_policy
Layer 4 — Specialists (20)¶
- Stateless; one invocation = one task
- Reads: the inbound capsule + its own memory namespace
- Writes: artifact + outbound capsule (back to L3)
- Skills: bound by
preferred_skills∪skills_allowedfrom capsule - Quality threshold from capsule (default 0.85)
Layer 5 — Workers (5)¶
- Atomic; one capability per worker
- code-writer-worker, file-operator-worker, api-caller-worker, test-runner-worker, git-worker
- Reads: explicit single-action instruction
- Writes: result blob
- No autonomy; obey the parent specialist
Routing transitions¶
User intent
│
▼
Cortex (L1) ──► intent_classification ──► routing_rule lookup ──► capsule_to(L2)
│
┌────────────────────────┘
▼
L2 domain orch ──► decompose into stories
│
▼
spawn L3 task_orch_<story_id>
│
▼
sequence L4 specialists
│ ▲
▼ │
delegate to L5 workers as needed
│
▼
L3 aggregates result, returns to L2
│
▼
L2 returns to L1 (or to next L2 if cross-domain)
│
▼
L1 returns to user
Cross-domain orchestration¶
When an intent spans multiple Layer-2 domains (e.g., "build a fullstack feature"):
- Cortex selects a cross_domain_pattern from ROUTING-COMPASS
- Pattern declares a sequence: e.g., backend_orch.design_api → frontend_orch.implement_ui → qa_orch.tests → devops_orch.deploy
- Each step waits for the previous (or runs in parallel if pattern allows)
Fallback chains¶
Every orchestrator declares a fallback_chain field. On failure:
1. First fallback (often a simpler version of itself)
2. Then cortex (global fallback)
3. Then human_escalation (last resort)
Fallbacks are intentional values, not orchestrator IDs (audit allowlist documented in _audit_semantic.py).
Failure semantics¶
| Failure type | Handled by | Action |
|---|---|---|
| Capsule schema invalid | Receiver | Reject; return error capsule |
| Budget exceeded | CIRCUIT-BREAKER (Camada 9, future) | HALT; emit cost alert |
| Constitutional violation | Auditor-haiku | HALT; emit INCIDENT |
| Skill unavailable | Specialist | Try fallback skill; if none → escalate to L3 |
| Deadline missed | Task orchestrator | Renegotiate or terminate story |
| Veto from security/qa | Veto-holder | HALT; require ADR to proceed |
SLOs at the orchestration layer¶
(Will land in AGENT-METRICS — Camada 9, planned.)
Indicative targets: - Cortex classification latency: p95 < 200 ms - Domain orchestrator decomposition: p95 < 1 s - Task orchestrator overhead per specialist call: p95 < 50 ms
What this spec does NOT cover¶
- Per-agent prompt content →
docs/reference/templates/AGENT-MANIFEST.md(template) + per-agent manifests (instances) - Skill catalog →
_skills-registry.yaml - Routing rules in detail →
ROUTING-COMPASS.yaml - Capsule schema →
docs/reference/contracts/handoff-capsule.schema.json - Checkpoint schema →
docs/reference/contracts/checkpoint.schema.json
Evolution¶
| Version | Date | Change |
|---|---|---|
| 1.0.0 | 2026-05-23 | Initial spec; matches _orchestrators-registry.yaml v1.0.1 |
| (future) | — | Add Camada 13 autonomy integration (WORLD-STATE + EVENT-BUS) |
| (future) | — | Add Camada 18 cognitive layer hooks (mirror learning between L4 specialists) |