How-to: run the full multi-agent stack end-to-end (headless LLM mode)¶
Note (Sessão #13): The primary mode for using this framework is Claude Code in VS Code as the executor — see Work with Claude Code. The headless mode below requires an Anthropic API key and is for autonomous runs without a human driver.
After Sprint 1 (orchestrators landed), one command takes a plain-text user request, classifies the intent (Layer 1 Cortex), dispatches to the right specialist (Layer 2 Domain orchestrator), runs the 7-phase pipeline (Phase 0→6), and writes memory + audit + metrics + checkpoints.
Install¶
The [llm] extra activates real LLM mode. Without it, the runner falls back
to keyword-classifier + mock executor (still useful for smoke-testing the
wiring).
One-shot CLI¶
export ANTHROPIC_API_KEY=sk-ant-...
python -m src.orchestrators.runner "criar componente React de login com email"
Sample output:
Mode: LLM (Anthropic)
Request: criar componente React de login com email
────────────────────────────────────────────────────────────
Intent: build_feature (confidence=0.95)
Route: → frontend_orch → frontend-specialist
Pipeline: ✅ success
cycles_used=1
Tokens: 1842 in / 612 out
Cost: $0.014420
Duration: 4.21s
Programmatic usage¶
from src.orchestrators import EndToEndRunner
runner = EndToEndRunner(
api_key="sk-ant-...", # or omit → reads ANTHROPIC_API_KEY env
project_id="my-project",
budget_dollars=2.0,
budget_tokens=50_000,
cortex_model="claude-haiku-4-5", # cheap classifier
executor_model="claude-sonnet-4-6", # balanced executor
max_tokens_per_call=4096,
)
result = runner.run("Implementar autenticação OAuth2 no FastAPI")
if result.needs_clarification:
for q in result.classification.clarification_questions:
print("Q:", q)
elif result.is_ok():
print(result.artifact)
print(f"Cost: ${result.total_cost_usd:.4f}")
else:
print("Error:", result.error or result.pipeline_result.halt_reason)
What runs under the hood¶
EndToEndRunner.run(user_request)
① CortexOrchestrator.route()
• classifies intent (LLM Haiku — ~$0.001 per call)
• builds Capsule with: story_id, AC[5+], budget, deadline
② DomainOrchestrator.dispatch(capsule)
• loads system prompt for the specialist
• delegates to PipelineOrchestrator.run(executor=anthropic_call)
③ PipelineOrchestrator — Phases 0-6
Phase 0 Reception → emit `capsule.received` on EventBus
Phase 1 Reality Anchor
Phase 2 Planning → Rule 35 SDD
Phase 3 Gates → 21 absolute rules (PII, budget, ...)
Phase 4 Execution → real LLM call via Sonnet
Phase 5 Review → auditor-haiku
Phase 6 Handoff → write episodic memory + emit completed
Side-effects under _framework/:
• audit/chain.jsonl — HMAC-signed audit chain
• observability/agent_metrics.jsonl — per-invocation telemetry
• memory/episodic/episodic.jsonl — feed for MirrorLearner
• checkpoints/<id>.json — Phase 4 + 6 snapshots
• events.jsonl — EventBus stream
Without an API key¶
If ANTHROPIC_API_KEY is absent:
- Cortex falls back to keyword classifier (regex-based, no cost)
- Domain dispatch reaches Phase 4 — but Phase 4 halts with
no_executor_configured (Sessão #13 removed the mock fallback)
- Memory + audit still record the failure (MirrorLearner needs it)
runner = EndToEndRunner() # no api_key
print(runner.has_llm) # False
result = runner.run("criar landing page")
print(result.pipeline_result.halt_reason) # 'no_executor_configured'
To use the framework without an API key, switch to the bridge-driven flow: Work with Claude Code.
Cost guardrails¶
budget_dollars(default 2.0): hard cap per capsule.BudgetExceededErrorraised at the boundary.budget_tokens(default 50,000): token hard cap.- Cortex uses Haiku 4.5 (cheap) by default for classification.
- Executor uses Sonnet 4.6 (balanced). Override to
claude-haiku-4-5for ~5x cost reduction on simple tasks.
Tuning¶
| Knob | Default | When to change |
|---|---|---|
cortex_model |
claude-haiku-4-5 |
Sonnet for niche intents |
executor_model |
claude-sonnet-4-6 |
Opus 4.7 for complex coding; Haiku for simple Q&A |
budget_dollars |
2.0 | Tighten for high-volume |
max_tokens_per_call |
4096 | 16K+ for long-form; stream above 16K |
Validating with the smoke test¶
3 tests run against the real API. Total cost: ~$0.02.
Connecting to the dashboard¶
The runner writes events to _framework/events.jsonl. The dashboard backend
(make dashboard) reads from the same paths. To see the runner's events live:
# Terminal 1 — dashboard backend
make dashboard
# Terminal 2 — runner
python -m src.orchestrators.runner "criar componente React"
Then open http://localhost:8000 to watch the WorldState update in real-time as the capsule runs through Phase 0→6.
Limitations of v1.x¶
- Specialists are still generic — each specialist has a 2-3 sentence
default system prompt. Sprint 2 promotes them to full
PROMPT.mdfiles. - Skills are still markdown only — the SKILL.md content is sent as prompt prefix; no tool-use binding yet. Sprint 2 introduces executable skills.
- No real tool-use in Phase 4 — the executor calls the LLM with the task description; the LLM returns text. For real shell/git/HTTP actions, wait for Sprint 2's tool-use integration.