CIRCUIT-BREAKER — Cascading Failure Protection¶
Status: live · Version: 1.0.0 · Camada: 9
Purpose¶
Halt operations BEFORE they cascade into expensive or destructive states. Per-agent and per-system breakers.
Breaker categories¶
| Category | Trigger | Action |
|---|---|---|
| Budget breaker | invocation exceeds hard_cap | HALT this agent for N min; emit incident |
| Failure-rate breaker | success_rate < 0.5 over last 20 calls | HALT agent; fall back to next in chain |
| Latency breaker | p95 > 3× baseline for 5min | reduce load; alert |
| Constitutional breaker | absolute rule violation detected | HALT immediately; INCIDENT SEV1 |
| External-dep breaker | LLM provider rate-limit / outage | switch to fallback model per MODEL-MANAGEMENT |
| Resource-lock breaker | lock contention > 60s | re-queue; alert |
| Story-cost breaker | story spend > 1.5× capsule budget | pause story (CONTINUATION-PROTOCOL) |
States¶
closed (normal) → tripping (about to break) → open (broken; reject) → half-open (testing recovery) → closed
Each state transition emits an event.
Configuration¶
breakers:
budget:
enabled: true
window_seconds: 60
threshold: 1.0 # at 100% of hard_cap
cooldown_seconds: 600 # 10 min in open state
failure_rate:
enabled: true
window_calls: 20
threshold: 0.5 # 50% failure rate
cooldown_seconds: 300
latency:
enabled: true
baseline_p95_ms: 90000
threshold_multiplier: 3.0
sample_window_seconds: 300
cooldown_seconds: 180
Half-open testing¶
After cooldown, breaker enters half-open: 1 test invocation allowed. If success → close; if fail → reopen with longer cooldown (exponential backoff up to 1h).
Coordination across agents¶
When an agent's breaker trips: - Orchestrator routes new capsules to the agent's fallback (per AGENT-MANIFEST.fallback_chain) - Story owners notified - WORLD-STATE updated (C13 future) - Recovery monitored continuously
Failure modes (of the breaker itself)¶
| Mode | Mitigation |
|---|---|
| Breaker flapping | exponential backoff |
| False positive (breaker too sensitive) | tune thresholds per quarter |
| False negative (cascade slipped through) | postmortem; add new breaker if needed |
| Breaker config out of sync with reality | per-quarter review |
Cross-references¶
- AGENT-METRICS (sibling) — what feeds the breakers
- TOKEN-BUDGET-PROTOCOL (C7)
- MODEL-MANAGEMENT (sibling) — fallback model swap
- COST-THERMOSTAT (C15 planned) — companion: regulates cost via model choice
- INCIDENT.md template (C4)