AI infrastructure (Camada 8)¶
ModelManager¶
Fallback ladder per task tier. Each model has a CircuitBreaker; when one trips, the next model on the ladder is tried.
from src.ai import ModelManager
mm = ModelManager.default()
# Default ladders:
# simple : haiku → sonnet
# standard : haiku → sonnet → opus
# complex : sonnet → opus
# critical : opus → sonnet
result = mm.call("Summarize the last meeting.", tier="standard")
print(result["model"], result["tokens_in"], result["tokens_out"])
For production, inject an invoker callable: (model_id, prompt) -> (text, tokens_in, tokens_out).
FallbackError is raised when every model on the ladder is unavailable; the message includes which breakers were open vs which raised exceptions.
CircuitBreaker¶
Bulkhead pattern per resource (agent / skill / model). States:
CLOSED → (failure_threshold hits) → OPEN
OPEN → (cooldown_seconds elapsed) → HALF_OPEN
HALF_OPEN → (success_threshold_half consecutive) → CLOSED
HALF_OPEN → (any failure) → OPEN
Defaults: failure_threshold=5, success_threshold_half=2, cooldown_seconds=60. Thread-safe (threading.Lock).
Memory stores¶
Three JSONL-backed stores at v1.0 (Postgres + pgvector in v1.5+):
| Store | Default TTL | Use case |
|---|---|---|
EpisodicStore |
90 days | Time-bounded events (task traces, decisions) |
SemanticStore |
none | Long-lived facts (project knowledge, glossary) |
ProceduralStore |
180 days | How-to traces; used by Mirror Learning (Camada 18) |
All three share the same _JsonlStore base — record() appends, search() does linear scan with optional tag filter, all() returns the list.
RAG (file-based scaffold)¶
RAGIndex chunks markdown files and serves Jaccard-overlap retrieval. Swap-in path for pgvector or Chroma:
from src.ai import RAGIndex, RAGQuery
idx = RAGIndex()
idx.index_directory("docs/reference/specs", "**/*.md") # 122 chunks
hits = idx.query(RAGQuery(text="reception phase", top_k=5))
for h in hits:
print(f"{h.score:.3f} {h.path}#{h.chunk_id}")
AgentMetrics¶
Append-only JSONL emit + in-memory stats. Each MetricEvent records agent_id, invocation_id, success, tokens, cost, duration, error class.
from src.ai import AgentMetrics
m = AgentMetrics()
m.emit(agent_id="cortex", invocation_id="inv1", success=True,
tokens_in=100, tokens_out=50, cost_dollars=0.0001, duration_ms=42)
print(m.stats_by_agent("cortex"))
# {'agent_id': 'cortex', 'count': 1, 'success_rate': 1.0, ...}
Rate card (token_budget)¶
| Model | Input ($ / 1M tok) | Output ($ / 1M tok) |
|---|---|---|
claude-haiku-4-5 |
1.0 | 5.0 |
claude-sonnet-4-6 |
3.0 | 15.0 |
claude-opus-4-7 |
15.0 | 75.0 |
mock |
0.0 | 0.0 |
Override via COST-POLICY.yaml or environment in production.