Skip to content

How-to: connect the Anthropic API

Plug the real Anthropic API into the runtime so pipelines invoke claude-haiku-4-5, claude-sonnet-4-6, or claude-opus-4-7 instead of the mock invoker.

Install

pip install -e ".[llm]"     # adds anthropic>=0.50

The anthropic SDK is an optional dependency. The pipeline works without it (using the mock invoker); install it only when you want real LLM calls.

Set the API key

export ANTHROPIC_API_KEY="sk-ant-..."          # Linux/macOS
$env:ANTHROPIC_API_KEY="sk-ant-..."            # PowerShell

Or pass api_key=... explicitly to AnthropicInvoker(api_key=...).

Quickstart — one call

from src.ai import ModelManager

mm, invoker = ModelManager.with_anthropic()
result = mm.call("Explain RAG in one sentence.", tier="simple", invoker=invoker)

print(result["response"])           # actual text from claude-haiku-4-5
print(result["tokens_in"], "in /", result["tokens_out"], "out")
print(result["model"])              # claude-haiku-4-5

tier="simple" routes to Haiku first, falling back to Sonnet if Haiku's circuit opens. Other tiers: standard (Haiku→Sonnet→Opus), complex (Sonnet→Opus), critical (Opus→Sonnet).

Plug into a full pipeline run

from src.ai import AgentMetrics, ModelManager
from src.pipeline import (
    AgentRef, BudgetTracker, Capsule, CheckpointStore, Constraints,
    ContinuationStore, ExpectedOutput, LockManager, PipelineOrchestrator,
    TaskSpec, VerificationAgent,
)
from src.privacy import AuditChain

# 1. Real Anthropic-backed model manager
mm, anthropic_invoker = ModelManager.with_anthropic(max_tokens=4096)

# 2. Executor that turns the capsule into a prompt and runs it
def llm_executor(state):
    prompt = f"Task: {state.capsule.task.description}\n\n"
    prompt += "Acceptance criteria:\n"
    for ac in state.capsule.task.acceptance_criteria:
        prompt += f"- {ac}\n"
    result = mm.call(prompt, tier="complex", invoker=anthropic_invoker)
    state.budget.record_call(
        model=result["model"],
        tokens_in=result["tokens_in"],
        tokens_out=result["tokens_out"],
    )
    return {"artifact_main": result["response"]}

# 3. Wire orchestrator with full governance
orch = PipelineOrchestrator(
    budget=BudgetTracker(capsule_id="...", limit_tokens=50_000, limit_dollars=2.0),
    locks=LockManager(),
    verifier=VerificationAgent(),
    audit_chain=AuditChain(),
    metrics=AgentMetrics(),
    checkpoint_store=CheckpointStore(),
    continuation_store=ContinuationStore(),
)

# 4. Run
result = orch.run(capsule, executor=llm_executor)

How retry + fallback compose

Two layers protect every call:

Layer What it handles Configured by
SDK auto-retry (per-model) Transient 429 / 408 / 409 / 5xx with exponential backoff (default 3 retries) AnthropicInvoker(max_retries=N)
ModelManager fallback ladder (across models) Persistent failure on a model → next route in the ladder; trips CircuitBreaker on repeated failures ModelManager.default() / with_anthropic()

If Haiku's API returns 429 once → SDK retries with backoff → call succeeds. If Haiku is consistently failing → CircuitBreaker opens → ModelManager.select() returns Sonnet instead.

Cost tracking

Every call's tokens are captured. To estimate cost:

from src.pipeline.token_budget import estimate_cost

cost = estimate_cost(
    model=result["model"],
    tokens_input=result["tokens_in"],
    tokens_output=result["tokens_out"],
)
print(f"${cost:.6f}")

Pricing per million tokens (current rate card):

Model Input Output
claude-haiku-4-5 $1.00 $5.00
claude-sonnet-4-6 $3.00 $15.00
claude-opus-4-7 $5.00 $25.00

BudgetTracker.record_call() raises BudgetExceededError when the per-capsule cap is hit — wire this into state.budget.record_call(...) inside your executor for hard caps.

Network smoke tests

Real API call test lives in tests/smoke/test_llm_roundtrip.py (marked @pytest.mark.network). It's skipped automatically when ANTHROPIC_API_KEY is not set.

To run it:

export ANTHROPIC_API_KEY="sk-ant-..."
python -m pytest tests/smoke/test_llm_roundtrip.py -m network

Expected cost: < $0.001 per run (single Haiku call, 50–100 tokens).

Configure AnthropicInvoker in detail

Parameter Default Notes
api_key None → reads ANTHROPIC_API_KEY env Pass explicit for multi-tenant scenarios
max_tokens 4096 Per-call output cap
max_retries 3 SDK auto-retry on 429/408/409/5xx with exp backoff
timeout_seconds 60.0 Per-request timeout (SDK default is 600)
system_prompt None Optional system prompt prepended every call

Troubleshooting

Symptom Likely cause Fix
AnthropicNotInstalledError anthropic package missing pip install -e ".[llm]"
ValueError: ANTHROPIC_API_KEY not set No env var, no explicit key export ANTHROPIC_API_KEY=... or pass api_key=
anthropic.AuthenticationError Invalid key Verify the key in Console
anthropic.RateLimitError (after retries) Tier limits exhausted Add a slower tier in the ladder, or upgrade plan
FallbackError: all models on 'X' unavailable All routes have OPEN circuits Wait for cooldown (default 60s) or check upstream

See also