Privacy (Camada 9)¶

Enforces Rules 21-24 of the constitution: privacy-by-design, no raw PII in LLM context, data classification, inviolable audit chain.

Modules¶

Module	Responsibility
`pii_detector`	Regex + structural detection of email, CPF, CNPJ, credit_card, SSN, phone_br, IP, password keyvalue, api_key
`classifier`	Tier 0 (Public) / 1 (Internal) / 2 (Confidential) / 3 (Restricted)
`permissions`	RBAC + Purpose binding (DEBUG / FEATURE_BUILD / ANALYTICS / AUDIT / DSAR)
`audit_chain`	HMAC-chained append-only log; tampering detectable via `verify()`
`dsar`	LGPD/GDPR Data Subject Access Request handler
`deletion`	Cascade deletion across all stores
`retention`	TTL-driven sweep across episodic / semantic / procedural stores

Tier model¶

Tier 0 Public        — docs, open-source, ToS-bound
Tier 1 Internal      — default; operational metadata; no PII
Tier 2 Confidential  — business secrets, project keys; no PII
Tier 3 Restricted    — any PII, financial, health

Classification precedence:

If any PII regex matches → RESTRICTED
If text contains a _CONFIDENTIAL_HINTS marker (api_key=, secret=, -----BEGIN, client_secret, internal-only, do not share, confidential) → CONFIDENTIAL
If hints.explicit_tier provided → use it
If hints.source == "public_docs" or hints.license in (MIT, Apache-2.0, BSD-3) → PUBLIC
Default → INTERNAL

Audit chain (Rule 24)¶

Every audit event includes:

prev_hmac — HMAC of the previous event (chains the log)
hmac_ — HMAC of (event_blob + prev_hmac), signed with AUDIT_HMAC_KEY env var

AuditChain.verify() recomputes the chain and reports any mismatch, catching:

Modification — any field of an existing line differs from its HMAC
Insertion — a forged line's prev_hmac doesn't match the previous line's hmac_
Deletion — the next line's prev_hmac points to a vanished predecessor

In production: store AUDIT_HMAC_KEY in a vault (never .env committed). Rotate quarterly.

DSAR workflow¶

from src.privacy import (
    AuditChain, DeletionCascade, DSARHandler,
)
from src.privacy.dsar import DSARAction
from src.ai import EpisodicStore, SemanticStore, ProceduralStore

ep, sm, pr = EpisodicStore(), SemanticStore(), ProceduralStore()
ac = AuditChain()
dc = DeletionCascade(episodic=ep, semantic=sm, procedural=pr, audit=ac)
dh = DSARHandler(episodic=ep, semantic=sm, procedural=pr, audit=ac, deletion=dc)

# ACCESS: scan only
req = dh.new_request(subject="user@example.com", action=DSARAction.ACCESS)
resp = dh.handle(req)
print(resp.found_records, resp.status)

# ERASURE: scan + delete (delegates to DeletionCascade)
req = dh.new_request(subject="user@example.com", action=DSARAction.ERASURE)
resp = dh.handle(req)
# Every deletion appends a DELETE event to the audit chain with the identifier_hash payload

ERASURE without an injected DeletionCascade fails fast with erasure_requires_deletion_cascade rather than silently logging intent (a bug we caught in Sprint 1).

Permissions matrix (default)¶

Role	Max tier	Allowed purposes
operator	RESTRICTED	all
auditor-haiku	RESTRICTED	AUDIT, DSAR
cortex	CONFIDENTIAL	FEATURE_BUILD, ANALYTICS
code-writer	INTERNAL	FEATURE_BUILD, DEBUG
test-runner	INTERNAL	FEATURE_BUILD
file-operator	CONFIDENTIAL	FEATURE_BUILD, DEBUG

PermissionManager.enforce(role, tier, purpose) raises PermissionDenied if denied. check() returns (allowed, reason) without raising.

Retention¶

RetentionScheduler.sweep() runs nightly (or manually). For each record:

Use record.ttl_days if present
Otherwise default per policy (episodic=90, procedural=180, semantic=None)
Compare against record.created_at; delete if expired

Every deletion writes a DELETE_TTL audit event with ttl_days + age_days payload.