Skip to content

AI infrastructure (Camada 8)

ModelManager

Fallback ladder per task tier. Each model has a CircuitBreaker; when one trips, the next model on the ladder is tried.

from src.ai import ModelManager

mm = ModelManager.default()
# Default ladders:
#   simple   : haiku → sonnet
#   standard : haiku → sonnet → opus
#   complex  : sonnet → opus
#   critical : opus → sonnet

result = mm.call("Summarize the last meeting.", tier="standard")
print(result["model"], result["tokens_in"], result["tokens_out"])

For production, inject an invoker callable: (model_id, prompt) -> (text, tokens_in, tokens_out).

FallbackError is raised when every model on the ladder is unavailable; the message includes which breakers were open vs which raised exceptions.

CircuitBreaker

Bulkhead pattern per resource (agent / skill / model). States:

CLOSED  → (failure_threshold hits) → OPEN
OPEN    → (cooldown_seconds elapsed) → HALF_OPEN
HALF_OPEN → (success_threshold_half consecutive) → CLOSED
HALF_OPEN → (any failure) → OPEN

Defaults: failure_threshold=5, success_threshold_half=2, cooldown_seconds=60. Thread-safe (threading.Lock).

Memory stores

Three JSONL-backed stores at v1.0 (Postgres + pgvector in v1.5+):

Store Default TTL Use case
EpisodicStore 90 days Time-bounded events (task traces, decisions)
SemanticStore none Long-lived facts (project knowledge, glossary)
ProceduralStore 180 days How-to traces; used by Mirror Learning (Camada 18)

All three share the same _JsonlStore base — record() appends, search() does linear scan with optional tag filter, all() returns the list.

RAG (file-based scaffold)

RAGIndex chunks markdown files and serves Jaccard-overlap retrieval. Swap-in path for pgvector or Chroma:

from src.ai import RAGIndex, RAGQuery

idx = RAGIndex()
idx.index_directory("docs/reference/specs", "**/*.md")    # 122 chunks
hits = idx.query(RAGQuery(text="reception phase", top_k=5))
for h in hits:
    print(f"{h.score:.3f}  {h.path}#{h.chunk_id}")

AgentMetrics

Append-only JSONL emit + in-memory stats. Each MetricEvent records agent_id, invocation_id, success, tokens, cost, duration, error class.

from src.ai import AgentMetrics

m = AgentMetrics()
m.emit(agent_id="cortex", invocation_id="inv1", success=True,
       tokens_in=100, tokens_out=50, cost_dollars=0.0001, duration_ms=42)
print(m.stats_by_agent("cortex"))
# {'agent_id': 'cortex', 'count': 1, 'success_rate': 1.0, ...}

Rate card (token_budget)

Model Input ($ / 1M tok) Output ($ / 1M tok)
claude-haiku-4-5 1.0 5.0
claude-sonnet-4-6 3.0 15.0
claude-opus-4-7 15.0 75.0
mock 0.0 0.0

Override via COST-POLICY.yaml or environment in production.