AI infrastructure (Camada 8)¶

ModelManager¶

Fallback ladder per task tier. Each model has a CircuitBreaker; when one trips, the next model on the ladder is tried.

from src.ai import ModelManager

mm = ModelManager.default()
# Default ladders:
#   simple   : haiku → sonnet
#   standard : haiku → sonnet → opus
#   complex  : sonnet → opus
#   critical : opus → sonnet

result = mm.call("Summarize the last meeting.", tier="standard")
print(result["model"], result["tokens_in"], result["tokens_out"])

For production, inject an invoker callable: (model_id, prompt) -> (text, tokens_in, tokens_out).

FallbackError is raised when every model on the ladder is unavailable; the message includes which breakers were open vs which raised exceptions.

CircuitBreaker¶

Bulkhead pattern per resource (agent / skill / model). States:

CLOSED  → (failure_threshold hits) → OPEN
OPEN    → (cooldown_seconds elapsed) → HALF_OPEN
HALF_OPEN → (success_threshold_half consecutive) → CLOSED
HALF_OPEN → (any failure) → OPEN

Defaults: failure_threshold=5, success_threshold_half=2, cooldown_seconds=60. Thread-safe (threading.Lock).

Memory stores¶

Three JSONL-backed stores at v1.0 (Postgres + pgvector in v1.5+):

Store	Default TTL	Use case
`EpisodicStore`	90 days	Time-bounded events (task traces, decisions)
`SemanticStore`	none	Long-lived facts (project knowledge, glossary)
`ProceduralStore`	180 days	How-to traces; used by Mirror Learning (Camada 18)

All three share the same _JsonlStore base — record() appends, search() does linear scan with optional tag filter, all() returns the list.

RAG (file-based scaffold)¶

RAGIndex chunks markdown files and serves Jaccard-overlap retrieval. Swap-in path for pgvector or Chroma:

from src.ai import RAGIndex, RAGQuery

idx = RAGIndex()
idx.index_directory("docs/reference/specs", "**/*.md")    # 122 chunks
hits = idx.query(RAGQuery(text="reception phase", top_k=5))
for h in hits:
    print(f"{h.score:.3f}  {h.path}#{h.chunk_id}")

AgentMetrics¶

Append-only JSONL emit + in-memory stats. Each MetricEvent records agent_id, invocation_id, success, tokens, cost, duration, error class.

from src.ai import AgentMetrics

m = AgentMetrics()
m.emit(agent_id="cortex", invocation_id="inv1", success=True,
       tokens_in=100, tokens_out=50, cost_dollars=0.0001, duration_ms=42)
print(m.stats_by_agent("cortex"))
# {'agent_id': 'cortex', 'count': 1, 'success_rate': 1.0, ...}

Rate card (token_budget)¶

Model	Input ($ / 1M tok)	Output ($ / 1M tok)
`claude-haiku-4-5`	1.0	5.0
`claude-sonnet-4-6`	3.0	15.0
`claude-opus-4-7`	15.0	75.0
`mock`	0.0	0.0

Override via COST-POLICY.yaml or environment in production.