MODEL-MANAGEMENT — LLM Versioning + Upgrade Policy¶

Status: live · Version: 1.0.0 · Camada: 9

Purpose¶

How we version, pin, evaluate, and upgrade LLMs across the agent fleet. LLM choice is consequential — wrong upgrade can regress quality silently.

Model registry (current)¶

Agent layer	Model (current)	Notes
L1 Cortex	haiku-4-5	cheap classifier
L2 Domain orchs	sonnet-4-6	reasoning over decomposition
L3 Task orchs	sonnet-4-6	per-story planning
L4 Specialists	sonnet-4-6 default	per-agent override allowed
L5 Workers	haiku-4-5	atomic tasks
planner-opus	opus-4-7	sparing use
auditor-haiku	haiku-4-5	universal reviewer

Upgrade triggers¶

Trigger	Action
Provider releases new minor (4.6 → 4.7)	A/B eval on representative tasks; promote if delta-acc ≥ +1% AND delta-cost ≤ +30%
Provider releases major (4.x → 5.x)	RFC + ADR; full eval suite
Pricing change	re-evaluate cost-per-task; may downgrade fleet
Quality regression detected	rollback to previous pin

Pinning¶

Each agent's runtime.model field in AGENT-MANIFEST is the source of truth. Changing it requires: - ADR for major version - DECISIONS log for minor - Cost projection update in BUDGET.md - 24h observation before promotion to all agents in layer

Evaluation suite¶

Per-layer eval task set (when implemented in Wave 4): - L1: intent classification accuracy on 500 samples - L2: routing decision quality - L4: skill outputs reviewed by planner-opus - Workers: deterministic task pass-rate

Rollback¶

1. Quality regression flagged (success_rate < threshold, or postmortem identifies cause)
2. Manifest `runtime.model` reverts to last_known_good
3. CHANGELOG entry; ADR if it was an active choice
4. Investigation: was it a model issue or our prompt?

Multi-provider fallback (planned)¶

Primary: Anthropic. Fallback: OpenAI / Vertex (when configured).

Switch triggers: - Anthropic outage > 5min - Anthropic rate-limit response - Cost spike beyond threshold (auto-throttle to cheaper)

Fallback is per-capsule decision via cortex; logged in audit.

Cross-references¶

AGENT-MANIFEST (C4 template)
BUDGET (C1 template) — cost projections
COST-THERMOSTAT (C15 planned) — automated cost-based fallback
MODEL-THROTTLE-PROTOCOL (C15 planned)
TECHNOLOGY-RADAR (C16 planned) — scans new model releases