MODEL-MANAGEMENT — LLM Versioning + Upgrade Policy¶
Status: live · Version: 1.0.0 · Camada: 9
Purpose¶
How we version, pin, evaluate, and upgrade LLMs across the agent fleet. LLM choice is consequential — wrong upgrade can regress quality silently.
Model registry (current)¶
| Agent layer | Model (current) | Notes |
|---|---|---|
| L1 Cortex | haiku-4-5 | cheap classifier |
| L2 Domain orchs | sonnet-4-6 | reasoning over decomposition |
| L3 Task orchs | sonnet-4-6 | per-story planning |
| L4 Specialists | sonnet-4-6 default | per-agent override allowed |
| L5 Workers | haiku-4-5 | atomic tasks |
| planner-opus | opus-4-7 | sparing use |
| auditor-haiku | haiku-4-5 | universal reviewer |
Upgrade triggers¶
| Trigger | Action |
|---|---|
| Provider releases new minor (4.6 → 4.7) | A/B eval on representative tasks; promote if delta-acc ≥ +1% AND delta-cost ≤ +30% |
| Provider releases major (4.x → 5.x) | RFC + ADR; full eval suite |
| Pricing change | re-evaluate cost-per-task; may downgrade fleet |
| Quality regression detected | rollback to previous pin |
Pinning¶
Each agent's runtime.model field in AGENT-MANIFEST is the source of truth. Changing it requires:
- ADR for major version
- DECISIONS log for minor
- Cost projection update in BUDGET.md
- 24h observation before promotion to all agents in layer
Evaluation suite¶
Per-layer eval task set (when implemented in Wave 4): - L1: intent classification accuracy on 500 samples - L2: routing decision quality - L4: skill outputs reviewed by planner-opus - Workers: deterministic task pass-rate
Rollback¶
1. Quality regression flagged (success_rate < threshold, or postmortem identifies cause)
2. Manifest `runtime.model` reverts to last_known_good
3. CHANGELOG entry; ADR if it was an active choice
4. Investigation: was it a model issue or our prompt?
Multi-provider fallback (planned)¶
Primary: Anthropic. Fallback: OpenAI / Vertex (when configured).
Switch triggers: - Anthropic outage > 5min - Anthropic rate-limit response - Cost spike beyond threshold (auto-throttle to cheaper)
Fallback is per-capsule decision via cortex; logged in audit.
Cross-references¶
- AGENT-MANIFEST (C4 template)
- BUDGET (C1 template) — cost projections
- COST-THERMOSTAT (C15 planned) — automated cost-based fallback
- MODEL-THROTTLE-PROTOCOL (C15 planned)
- TECHNOLOGY-RADAR (C16 planned) — scans new model releases