Skip to content

How-to: run the full multi-agent stack end-to-end (headless LLM mode)

Note (Sessão #13): The primary mode for using this framework is Claude Code in VS Code as the executor — see Work with Claude Code. The headless mode below requires an Anthropic API key and is for autonomous runs without a human driver.

After Sprint 1 (orchestrators landed), one command takes a plain-text user request, classifies the intent (Layer 1 Cortex), dispatches to the right specialist (Layer 2 Domain orchestrator), runs the 7-phase pipeline (Phase 0→6), and writes memory + audit + metrics + checkpoints.

Install

pip install -e ".[llm,dashboard,observability]"

The [llm] extra activates real LLM mode. Without it, the runner falls back to keyword-classifier + mock executor (still useful for smoke-testing the wiring).

One-shot CLI

export ANTHROPIC_API_KEY=sk-ant-...
python -m src.orchestrators.runner "criar componente React de login com email"

Sample output:

Mode: LLM (Anthropic)
Request: criar componente React de login com email
────────────────────────────────────────────────────────────
Intent: build_feature (confidence=0.95)
Route: → frontend_orch → frontend-specialist
Pipeline: ✅ success
  cycles_used=1
Tokens: 1842 in / 612 out
Cost: $0.014420
Duration: 4.21s

Programmatic usage

from src.orchestrators import EndToEndRunner

runner = EndToEndRunner(
    api_key="sk-ant-...",          # or omit → reads ANTHROPIC_API_KEY env
    project_id="my-project",
    budget_dollars=2.0,
    budget_tokens=50_000,
    cortex_model="claude-haiku-4-5",       # cheap classifier
    executor_model="claude-sonnet-4-6",    # balanced executor
    max_tokens_per_call=4096,
)

result = runner.run("Implementar autenticação OAuth2 no FastAPI")

if result.needs_clarification:
    for q in result.classification.clarification_questions:
        print("Q:", q)
elif result.is_ok():
    print(result.artifact)
    print(f"Cost: ${result.total_cost_usd:.4f}")
else:
    print("Error:", result.error or result.pipeline_result.halt_reason)

What runs under the hood

EndToEndRunner.run(user_request)

  ① CortexOrchestrator.route()
      • classifies intent (LLM Haiku — ~$0.001 per call)
      • builds Capsule with: story_id, AC[5+], budget, deadline

  ② DomainOrchestrator.dispatch(capsule)
      • loads system prompt for the specialist
      • delegates to PipelineOrchestrator.run(executor=anthropic_call)

  ③ PipelineOrchestrator — Phases 0-6
      Phase 0 Reception     → emit `capsule.received` on EventBus
      Phase 1 Reality Anchor
      Phase 2 Planning      → Rule 35 SDD
      Phase 3 Gates         → 21 absolute rules (PII, budget, ...)
      Phase 4 Execution     → real LLM call via Sonnet
      Phase 5 Review        → auditor-haiku
      Phase 6 Handoff       → write episodic memory + emit completed

  Side-effects under _framework/:
   • audit/chain.jsonl                — HMAC-signed audit chain
   • observability/agent_metrics.jsonl — per-invocation telemetry
   • memory/episodic/episodic.jsonl   — feed for MirrorLearner
   • checkpoints/<id>.json            — Phase 4 + 6 snapshots
   • events.jsonl                     — EventBus stream

Without an API key

If ANTHROPIC_API_KEY is absent: - Cortex falls back to keyword classifier (regex-based, no cost) - Domain dispatch reaches Phase 4 — but Phase 4 halts with no_executor_configured (Sessão #13 removed the mock fallback) - Memory + audit still record the failure (MirrorLearner needs it)

runner = EndToEndRunner()       # no api_key
print(runner.has_llm)           # False
result = runner.run("criar landing page")
print(result.pipeline_result.halt_reason)   # 'no_executor_configured'

To use the framework without an API key, switch to the bridge-driven flow: Work with Claude Code.

Cost guardrails

  • budget_dollars (default 2.0): hard cap per capsule. BudgetExceededError raised at the boundary.
  • budget_tokens (default 50,000): token hard cap.
  • Cortex uses Haiku 4.5 (cheap) by default for classification.
  • Executor uses Sonnet 4.6 (balanced). Override to claude-haiku-4-5 for ~5x cost reduction on simple tasks.

Tuning

Knob Default When to change
cortex_model claude-haiku-4-5 Sonnet for niche intents
executor_model claude-sonnet-4-6 Opus 4.7 for complex coding; Haiku for simple Q&A
budget_dollars 2.0 Tighten for high-volume
max_tokens_per_call 4096 16K+ for long-form; stream above 16K

Validating with the smoke test

export ANTHROPIC_API_KEY=sk-ant-...
pytest tests/smoke/test_real_llm_e2e.py -m network -v

3 tests run against the real API. Total cost: ~$0.02.

Connecting to the dashboard

The runner writes events to _framework/events.jsonl. The dashboard backend (make dashboard) reads from the same paths. To see the runner's events live:

# Terminal 1 — dashboard backend
make dashboard

# Terminal 2 — runner
python -m src.orchestrators.runner "criar componente React"

Then open http://localhost:8000 to watch the WorldState update in real-time as the capsule runs through Phase 0→6.

Limitations of v1.x

  • Specialists are still generic — each specialist has a 2-3 sentence default system prompt. Sprint 2 promotes them to full PROMPT.md files.
  • Skills are still markdown only — the SKILL.md content is sent as prompt prefix; no tool-use binding yet. Sprint 2 introduces executable skills.
  • No real tool-use in Phase 4 — the executor calls the LLM with the task description; the LLM returns text. For real shell/git/HTTP actions, wait for Sprint 2's tool-use integration.

See also