Governance & Classification
After data is ingested, the platform automatically classifies it, detects PII, and generates governance recommendations.
Post-Ingestion Intelligence Pipeline
Airbyte sync completes → catalog-service registers tables
↓
TABLE_DISCOVERED event → [catalog-events Kafka topic]
↓
┌─────────────────────────┬──────────────────────┬────────────────────┬──────────────────┐
│ Auto-Classification │ Auto-Ontology │ Auto-Semantic │ Auto-Quality │
│ (PII detection + level) │ (entity extraction) │ (DRAFT models) │ (profiling) │
└─────────────────────────┴──────────────────────┴────────────────────┴──────────────────┘
↓
DATA_CLASSIFIED event → governance-service
↓
RLS suggestions + masking rulesAuto-Classification
When a new table is discovered via ingestion:
- PII Detection — scans column names and sample values for patterns (SSN, email, phone, credit card, addresses)
- Risk Level — assigns NONE / LOW / MEDIUM / HIGH / CRITICAL based on PII types found
- Classification Level — PUBLIC / INTERNAL / CONFIDENTIAL / RESTRICTED based on data sensitivity
RLS Auto-Suggestions
For tables classified as HIGH or CRITICAL:
| PII Type | Suggested Policy |
|---|---|
tenant_id column | Tenant-scoped RLS: WHERE tenant_id = current_tenant() |
| SSN | Column restriction: only PII_VIEWER role sees full value |
Masking: ***@domain.com for non-privileged users | |
| Phone | Masking: (XXX) XXX-1234 for non-privileged users |
| Credit Card | PCI-DSS masking: XXXX-XXXX-XXXX-1234 |
Suggestions are created as DRAFT policies that require human approval before activation.
Dynamic Masking
Trino masking functions are automatically generated:
| PII Type | Masking Expression |
|---|---|
| SSN | 'XXX-XX-' || SUBSTR(column, -4) |
'***@' || SPLIT_PART(column, '@', 2) | |
| Phone | '(XXX) XXX-' || SUBSTR(column, -4) |
| Credit Card | 'XXXX-XXXX-XXXX-' || SUBSTR(column, -4) |
Accuracy Metrics
Three accuracy services run after each sync:
Freshness SLA
- Tracks
last_sync_atvs configured SLA (default: 24 hours) - Alerts on breach via notification-service
Schema Drift Detection
- Compares column schemas between consecutive syncs
- Detects: added columns, removed columns, type changes
Row Count Validation
- Flags >20% row count drops as anomalies (possible data loss)
- Flags >500% spikes as unusual (possible data explosion)
- Critical alert on drop to zero
RBAC
| Operation | Permission | Roles |
|---|---|---|
| View classification | catalog:read | DATA_ENGINEER, DATA_ANALYST, DATA_SCIENTIST |
| View RLS suggestions | governance:read | DATA_ENGINEER, PLATFORM_ADMIN |
| Apply RLS policies | governance:write | DATA_ENGINEER, PLATFORM_ADMIN |