Governance Service
The Governance Service is a Python/FastAPI application that enforces data access policies, manages data contracts, and ensures compliance with organizational governance requirements. It integrates with Apache Polaris for Iceberg table governance and the Open Policy Agent (OPA) for fine-grained policy evaluation.
Service Architecture
| Property | Value |
|---|---|
| Language | Python 3.11 |
| Framework | FastAPI |
| Port | 8080 |
| Namespace | matih-data-plane |
| Policy engine | Open Policy Agent (OPA) |
| Catalog governance | Apache Polaris |
Component Layout
+------------------------------------------------------------------+
| Governance Service |
| |
| +-------------------+ +--------------------+ +---------------+ |
| | FastAPI Routes | | Policy Engine | | Contract Mgr | |
| | - Access policies | | - OPA integration | | - Data | |
| | - RLS evaluation | | - Rego policies | | contracts | |
| | - Masking rules | | - Policy versioning| | - SLA rules | |
| | - Compliance | | - Audit logging | | - Violations | |
| +-------------------+ +--------------------+ +---------------+ |
| |
| +-------------------+ +--------------------+ |
| | Polaris Integration| | Compliance Engine | |
| | - Namespace RBAC | | - Retention rules | |
| | - Table grants | | - Access reviews | |
| | - Credential vend | | - Consent mgmt | |
| +-------------------+ +--------------------+ |
+------------------------------------------------------------------+
| |
+--------v---------+ +-------v--------+
| OPA Sidecar | | Apache Polaris |
| (Policy Engine) | | (Iceberg Gov) |
+------------------+ +----------------+OPA Integration
The Governance Service uses the Open Policy Agent (OPA) as the primary policy evaluation engine. OPA evaluates policies written in the Rego language against structured input data.
Policy Evaluation Flow
1. Query Engine requests RLS evaluation
|
2. Governance Service constructs OPA input
|
3. OPA evaluates Rego policy against input
|
4. OPA returns decision (allow/deny + filters)
|
5. Governance Service returns filter to Query EngineOPA Input Structure
{
"input": {
"user": {
"id": "user-456",
"email": "analyst@acme.com",
"roles": ["analyst", "sales-team"],
"attributes": {
"department": "sales",
"region": "US-EAST",
"clearance_level": 2
}
},
"resource": {
"type": "TABLE",
"fqn": "analytics.public.orders",
"classification": "CONFIDENTIAL",
"owner": "data-engineering-team",
"tags": ["transactional", "finance"]
},
"action": "SELECT",
"context": {
"tenant_id": "tenant-123",
"timestamp": "2026-02-12T10:30:00Z",
"client_ip": "10.0.1.50",
"session_id": "session-789"
}
}
}Rego Policy Examples
Region-based RLS policy:
package matih.rls
# Default: no filter applied
default filter = {"requires_filtering": false}
# Apply region filter for non-admin users
filter = result {
not is_admin
user_region := input.user.attributes.region
result := {
"requires_filtering": true,
"where_clause": sprintf("region = '%s'", [user_region]),
"applied_policies": ["region_restriction"]
}
}
is_admin {
input.user.roles[_] == "admin"
}Department-based data access policy:
package matih.access
# Default: deny access
default allow = false
# Allow access if user's department matches table's domain
allow {
table_domain := input.resource.tags[_]
user_dept := input.user.attributes.department
table_domain == user_dept
}
# Admins can access everything
allow {
input.user.roles[_] == "admin"
}
# Data stewards can access for governance purposes
allow {
input.user.roles[_] == "data_steward"
input.action == "SELECT"
}Classification-based masking policy:
package matih.masking
# Determine masking level based on classification and user clearance
masking_rules[rule] {
column := input.columns[_]
column.classification == "RESTRICTED"
input.user.attributes.clearance_level < 3
rule := {
"column": column.name,
"strategy": "PARTIAL",
"config": {"show_last": 4, "mask_char": "*"}
}
}
masking_rules[rule] {
column := input.columns[_]
column.classification == "SECRET"
input.user.attributes.clearance_level < 4
rule := {
"column": column.name,
"strategy": "REDACT",
"config": {"replacement": "***REDACTED***"}
}
}Policy Versioning
Policies are stored in a version-controlled repository and deployed to OPA via the bundle mechanism:
| Component | Description |
|---|---|
| Policy repository | Git repository containing Rego policies |
| Bundle server | HTTP server that packages policies into OPA bundles |
| OPA sidecar | Pulls bundles on a configurable interval (default: 60s) |
| Policy cache | OPA caches compiled policies in memory |
Policy changes follow a deployment pipeline:
Edit Rego policy -> Unit test with OPA test -> Review -> Merge -> Bundle build -> OPA reloadPolaris Integration
Apache Polaris provides governance for Apache Iceberg tables, including namespace-level access control, table grants, and credential vending.
Polaris RBAC Model
Polaris Principal (mapped from MATIH user)
|
+-- PrincipalRole (mapped from MATIH role)
|
+-- CatalogRole (Polaris-specific)
|
+-- Grants on Namespace/Table
|
+-- Privileges: READ, WRITE, CREATE, DROP, MANAGENamespace Isolation
Each MATIH tenant maps to a Polaris namespace:
Polaris Catalog: matih
|
+-- Namespace: tenant_acme
| +-- Table: orders
| +-- Table: customers
|
+-- Namespace: tenant_globex
+-- Table: orders
+-- Table: inventoryGrant Management
POST /v1/governance/polaris/grants
Request:
{
"principal": "analyst-role",
"namespace": "tenant_acme",
"table": "orders",
"privileges": ["TABLE_READ_DATA", "TABLE_LIST"],
"grantOption": false
}Credential Vending
When a query requires access to Iceberg data files, the governance service coordinates with Polaris to vend scoped credentials:
1. Query Engine requests data access for table "tenant_acme.orders"
2. Governance Service evaluates access policy
3. If allowed, Polaris vends temporary S3/Azure credentials
4. Credentials are scoped to the specific table's data directory
5. Credentials expire after 1 hourData Contracts
The Governance Service manages data contracts that define expectations between data producers and consumers:
Contract Structure
{
"id": "contract-sales-to-bi",
"name": "Sales Data Contract for BI Service",
"version": "2.1.0",
"producer": {
"team": "data-engineering",
"service": "pipeline-service"
},
"consumer": {
"team": "analytics",
"service": "bi-service"
},
"dataset": "analytics.public.daily_sales",
"schema": {
"fields": [
{"name": "date", "type": "DATE", "required": true},
{"name": "region", "type": "VARCHAR", "required": true},
{"name": "total_sales", "type": "DECIMAL(12,2)", "required": true},
{"name": "order_count", "type": "INTEGER", "required": true}
]
},
"sla": {
"freshness": {"maxDelayMinutes": 60},
"availability": {"uptimePercentage": 99.9},
"quality": {"minScore": 0.95},
"completeness": {"maxNullPercentage": 1.0}
},
"validFrom": "2026-01-01",
"validUntil": "2027-01-01",
"status": "ACTIVE"
}Contract Enforcement
The governance service monitors active contracts and flags violations:
| SLA Metric | Check Method | Alert Threshold |
|---|---|---|
| Freshness | Compare last update timestamp with SLA | > maxDelayMinutes |
| Availability | Track table accessibility over rolling window | < uptimePercentage |
| Quality | Integrate with Data Quality Service scores | < minScore |
| Completeness | Monitor null ratios from profiling | > maxNullPercentage |
| Schema | Compare current schema with contract schema | Any deviation |
Access Review
The governance service supports periodic access reviews:
GET /v1/governance/access-reviews?status=PENDING
Response:
{
"reviews": [
{
"id": "review-001",
"type": "QUARTERLY_ACCESS_REVIEW",
"reviewer": "data-steward@acme.com",
"subject": "analyst-role",
"resource": "analytics.public.customers",
"currentAccess": ["TABLE_READ_DATA", "TABLE_LIST"],
"lastAccessedAt": "2026-02-10T15:30:00Z",
"accessFrequency": "12 queries in last 30 days",
"recommendation": "MAINTAIN",
"dueDate": "2026-02-28",
"status": "PENDING"
}
]
}Compliance Engine
The compliance engine tracks regulatory requirements and maps them to governance controls:
| Regulation | Controls |
|---|---|
| GDPR | Data subject access requests, right to erasure, consent tracking, data minimization |
| HIPAA | PHI access controls, audit logging, minimum necessary access, encryption verification |
| SOC 2 | Access reviews, change management, incident response, monitoring |
| PCI DSS | Cardholder data masking, access restrictions, key management, audit trails |
Retention Policies
{
"policy": "gdpr-personal-data-retention",
"classification": "RESTRICTED",
"tags": ["PII"],
"retentionPeriod": "730 days",
"action": "DELETE",
"exceptions": [
{"condition": "legal_hold = true", "override": "RETAIN"}
],
"notifyBeforeDays": 30
}Audit Logging
Every governance decision is logged to an immutable audit trail:
{
"event": "access_policy_evaluated",
"timestamp": "2026-02-12T10:30:00Z",
"tenantId": "tenant-123",
"userId": "user-456",
"resource": "analytics.public.orders",
"action": "SELECT",
"decision": "ALLOW_WITH_FILTER",
"filter": "region = 'US-EAST'",
"policies": ["region_restriction"],
"evaluationTimeMs": 8,
"sessionId": "session-789",
"clientIp": "10.0.1.50"
}Related Sections
- Row-Level Security -- RLS filter injection in the Query Engine
- Data Masking -- Masking rules driven by governance policies
- Classification -- Classification tags used in policy evaluation
- Data Quality -- Quality monitoring for data contracts
- API Reference -- Governance Service endpoints