Skip to content

CIRCUIT-BREAKER — Cascading Failure Protection

Status: live · Version: 1.0.0 · Camada: 9

Purpose

Halt operations BEFORE they cascade into expensive or destructive states. Per-agent and per-system breakers.

Breaker categories

Category Trigger Action
Budget breaker invocation exceeds hard_cap HALT this agent for N min; emit incident
Failure-rate breaker success_rate < 0.5 over last 20 calls HALT agent; fall back to next in chain
Latency breaker p95 > 3× baseline for 5min reduce load; alert
Constitutional breaker absolute rule violation detected HALT immediately; INCIDENT SEV1
External-dep breaker LLM provider rate-limit / outage switch to fallback model per MODEL-MANAGEMENT
Resource-lock breaker lock contention > 60s re-queue; alert
Story-cost breaker story spend > 1.5× capsule budget pause story (CONTINUATION-PROTOCOL)

States

closed (normal) → tripping (about to break) → open (broken; reject) → half-open (testing recovery) → closed

Each state transition emits an event.

Configuration

breakers:
  budget:
    enabled: true
    window_seconds: 60
    threshold: 1.0          # at 100% of hard_cap
    cooldown_seconds: 600   # 10 min in open state
  failure_rate:
    enabled: true
    window_calls: 20
    threshold: 0.5          # 50% failure rate
    cooldown_seconds: 300
  latency:
    enabled: true
    baseline_p95_ms: 90000
    threshold_multiplier: 3.0
    sample_window_seconds: 300
    cooldown_seconds: 180

Half-open testing

After cooldown, breaker enters half-open: 1 test invocation allowed. If success → close; if fail → reopen with longer cooldown (exponential backoff up to 1h).

Coordination across agents

When an agent's breaker trips: - Orchestrator routes new capsules to the agent's fallback (per AGENT-MANIFEST.fallback_chain) - Story owners notified - WORLD-STATE updated (C13 future) - Recovery monitored continuously

Failure modes (of the breaker itself)

Mode Mitigation
Breaker flapping exponential backoff
False positive (breaker too sensitive) tune thresholds per quarter
False negative (cascade slipped through) postmortem; add new breaker if needed
Breaker config out of sync with reality per-quarter review

Cross-references

  • AGENT-METRICS (sibling) — what feeds the breakers
  • TOKEN-BUDGET-PROTOCOL (C7)
  • MODEL-MANAGEMENT (sibling) — fallback model swap
  • COST-THERMOSTAT (C15 planned) — companion: regulates cost via model choice
  • INCIDENT.md template (C4)