Skip to content

Privacy (Camada 9)

Enforces Rules 21-24 of the constitution: privacy-by-design, no raw PII in LLM context, data classification, inviolable audit chain.

Modules

Module Responsibility
pii_detector Regex + structural detection of email, CPF, CNPJ, credit_card, SSN, phone_br, IP, password keyvalue, api_key
classifier Tier 0 (Public) / 1 (Internal) / 2 (Confidential) / 3 (Restricted)
permissions RBAC + Purpose binding (DEBUG / FEATURE_BUILD / ANALYTICS / AUDIT / DSAR)
audit_chain HMAC-chained append-only log; tampering detectable via verify()
dsar LGPD/GDPR Data Subject Access Request handler
deletion Cascade deletion across all stores
retention TTL-driven sweep across episodic / semantic / procedural stores

Tier model

Tier 0 Public        — docs, open-source, ToS-bound
Tier 1 Internal      — default; operational metadata; no PII
Tier 2 Confidential  — business secrets, project keys; no PII
Tier 3 Restricted    — any PII, financial, health

Classification precedence:

  1. If any PII regex matches → RESTRICTED
  2. If text contains a _CONFIDENTIAL_HINTS marker (api_key=, secret=, -----BEGIN, client_secret, internal-only, do not share, confidential) → CONFIDENTIAL
  3. If hints.explicit_tier provided → use it
  4. If hints.source == "public_docs" or hints.license in (MIT, Apache-2.0, BSD-3) → PUBLIC
  5. Default → INTERNAL

Audit chain (Rule 24)

Every audit event includes:

  • prev_hmac — HMAC of the previous event (chains the log)
  • hmac_ — HMAC of (event_blob + prev_hmac), signed with AUDIT_HMAC_KEY env var

AuditChain.verify() recomputes the chain and reports any mismatch, catching:

  • Modification — any field of an existing line differs from its HMAC
  • Insertion — a forged line's prev_hmac doesn't match the previous line's hmac_
  • Deletion — the next line's prev_hmac points to a vanished predecessor

In production: store AUDIT_HMAC_KEY in a vault (never .env committed). Rotate quarterly.

DSAR workflow

from src.privacy import (
    AuditChain, DeletionCascade, DSARHandler,
)
from src.privacy.dsar import DSARAction
from src.ai import EpisodicStore, SemanticStore, ProceduralStore

ep, sm, pr = EpisodicStore(), SemanticStore(), ProceduralStore()
ac = AuditChain()
dc = DeletionCascade(episodic=ep, semantic=sm, procedural=pr, audit=ac)
dh = DSARHandler(episodic=ep, semantic=sm, procedural=pr, audit=ac, deletion=dc)

# ACCESS: scan only
req = dh.new_request(subject="user@example.com", action=DSARAction.ACCESS)
resp = dh.handle(req)
print(resp.found_records, resp.status)

# ERASURE: scan + delete (delegates to DeletionCascade)
req = dh.new_request(subject="user@example.com", action=DSARAction.ERASURE)
resp = dh.handle(req)
# Every deletion appends a DELETE event to the audit chain with the identifier_hash payload

ERASURE without an injected DeletionCascade fails fast with erasure_requires_deletion_cascade rather than silently logging intent (a bug we caught in Sprint 1).

Permissions matrix (default)

Role Max tier Allowed purposes
operator RESTRICTED all
auditor-haiku RESTRICTED AUDIT, DSAR
cortex CONFIDENTIAL FEATURE_BUILD, ANALYTICS
code-writer INTERNAL FEATURE_BUILD, DEBUG
test-runner INTERNAL FEATURE_BUILD
file-operator CONFIDENTIAL FEATURE_BUILD, DEBUG

PermissionManager.enforce(role, tier, purpose) raises PermissionDenied if denied. check() returns (allowed, reason) without raising.

Retention

RetentionScheduler.sweep() runs nightly (or manually). For each record:

  • Use record.ttl_days if present
  • Otherwise default per policy (episodic=90, procedural=180, semantic=None)
  • Compare against record.created_at; delete if expired

Every deletion writes a DELETE_TTL audit event with ttl_days + age_days payload.