Governance Service

The Governance Service is a Python/FastAPI application that enforces data access policies, manages data contracts, and ensures compliance with organizational governance requirements. It integrates with Apache Polaris for Iceberg table governance and the Open Policy Agent (OPA) for fine-grained policy evaluation.

Service Architecture

Property	Value
Language	Python 3.11
Framework	FastAPI
Port	8080
Namespace	`matih-data-plane`
Policy engine	Open Policy Agent (OPA)
Catalog governance	Apache Polaris

Component Layout

+------------------------------------------------------------------+
|                     Governance Service                             |
|                                                                   |
|  +-------------------+  +--------------------+  +---------------+ |
|  | FastAPI Routes     |  | Policy Engine      |  | Contract Mgr  | |
|  | - Access policies  |  | - OPA integration  |  | - Data        | |
|  | - RLS evaluation   |  | - Rego policies    |  |   contracts   | |
|  | - Masking rules    |  | - Policy versioning|  | - SLA rules   | |
|  | - Compliance       |  | - Audit logging    |  | - Violations  | |
|  +-------------------+  +--------------------+  +---------------+ |
|                                                                   |
|  +-------------------+  +--------------------+                    |
|  | Polaris Integration|  | Compliance Engine  |                   |
|  | - Namespace RBAC  |  | - Retention rules  |                   |
|  | - Table grants    |  | - Access reviews   |                   |
|  | - Credential vend |  | - Consent mgmt    |                   |
|  +-------------------+  +--------------------+                    |
+------------------------------------------------------------------+
         |                          |
+--------v---------+       +-------v--------+
|    OPA Sidecar   |       | Apache Polaris |
|  (Policy Engine) |       | (Iceberg Gov)  |
+------------------+       +----------------+

OPA Integration

The Governance Service uses the Open Policy Agent (OPA) as the primary policy evaluation engine. OPA evaluates policies written in the Rego language against structured input data.

Policy Evaluation Flow

1. Query Engine requests RLS evaluation
    |
2. Governance Service constructs OPA input
    |
3. OPA evaluates Rego policy against input
    |
4. OPA returns decision (allow/deny + filters)
    |
5. Governance Service returns filter to Query Engine

OPA Input Structure

{
  "input": {
    "user": {
      "id": "user-456",
      "email": "analyst@acme.com",
      "roles": ["analyst", "sales-team"],
      "attributes": {
        "department": "sales",
        "region": "US-EAST",
        "clearance_level": 2
      }
    },
    "resource": {
      "type": "TABLE",
      "fqn": "analytics.public.orders",
      "classification": "CONFIDENTIAL",
      "owner": "data-engineering-team",
      "tags": ["transactional", "finance"]
    },
    "action": "SELECT",
    "context": {
      "tenant_id": "tenant-123",
      "timestamp": "2026-02-12T10:30:00Z",
      "client_ip": "10.0.1.50",
      "session_id": "session-789"
    }
  }
}

Rego Policy Examples

Region-based RLS policy:

package matih.rls

# Default: no filter applied
default filter = {"requires_filtering": false}

# Apply region filter for non-admin users
filter = result {
    not is_admin
    user_region := input.user.attributes.region
    result := {
        "requires_filtering": true,
        "where_clause": sprintf("region = '%s'", [user_region]),
        "applied_policies": ["region_restriction"]
    }
}

is_admin {
    input.user.roles[_] == "admin"
}

Department-based data access policy:

package matih.access

# Default: deny access
default allow = false

# Allow access if user's department matches table's domain
allow {
    table_domain := input.resource.tags[_]
    user_dept := input.user.attributes.department
    table_domain == user_dept
}

# Admins can access everything
allow {
    input.user.roles[_] == "admin"
}

# Data stewards can access for governance purposes
allow {
    input.user.roles[_] == "data_steward"
    input.action == "SELECT"
}

Classification-based masking policy:

package matih.masking

# Determine masking level based on classification and user clearance
masking_rules[rule] {
    column := input.columns[_]
    column.classification == "RESTRICTED"
    input.user.attributes.clearance_level < 3
    rule := {
        "column": column.name,
        "strategy": "PARTIAL",
        "config": {"show_last": 4, "mask_char": "*"}
    }
}

masking_rules[rule] {
    column := input.columns[_]
    column.classification == "SECRET"
    input.user.attributes.clearance_level < 4
    rule := {
        "column": column.name,
        "strategy": "REDACT",
        "config": {"replacement": "***REDACTED***"}
    }
}

Policy Versioning

Policies are stored in a version-controlled repository and deployed to OPA via the bundle mechanism:

Component	Description
Policy repository	Git repository containing Rego policies
Bundle server	HTTP server that packages policies into OPA bundles
OPA sidecar	Pulls bundles on a configurable interval (default: 60s)
Policy cache	OPA caches compiled policies in memory

Policy changes follow a deployment pipeline:

Edit Rego policy -> Unit test with OPA test -> Review -> Merge -> Bundle build -> OPA reload

Polaris Integration

Apache Polaris provides governance for Apache Iceberg tables, including namespace-level access control, table grants, and credential vending.

Polaris RBAC Model

Polaris Principal (mapped from MATIH user)
    |
    +-- PrincipalRole (mapped from MATIH role)
          |
          +-- CatalogRole (Polaris-specific)
                |
                +-- Grants on Namespace/Table
                      |
                      +-- Privileges: READ, WRITE, CREATE, DROP, MANAGE

Namespace Isolation

Each MATIH tenant maps to a Polaris namespace:

Polaris Catalog: matih
  |
  +-- Namespace: tenant_acme
  |     +-- Table: orders
  |     +-- Table: customers
  |
  +-- Namespace: tenant_globex
        +-- Table: orders
        +-- Table: inventory

Grant Management

POST /v1/governance/polaris/grants

Request:
{
  "principal": "analyst-role",
  "namespace": "tenant_acme",
  "table": "orders",
  "privileges": ["TABLE_READ_DATA", "TABLE_LIST"],
  "grantOption": false
}

Credential Vending

When a query requires access to Iceberg data files, the governance service coordinates with Polaris to vend scoped credentials:

1. Query Engine requests data access for table "tenant_acme.orders"
2. Governance Service evaluates access policy
3. If allowed, Polaris vends temporary S3/Azure credentials
4. Credentials are scoped to the specific table's data directory
5. Credentials expire after 1 hour

Data Contracts

The Governance Service manages data contracts that define expectations between data producers and consumers:

Contract Structure

{
  "id": "contract-sales-to-bi",
  "name": "Sales Data Contract for BI Service",
  "version": "2.1.0",
  "producer": {
    "team": "data-engineering",
    "service": "pipeline-service"
  },
  "consumer": {
    "team": "analytics",
    "service": "bi-service"
  },
  "dataset": "analytics.public.daily_sales",
  "schema": {
    "fields": [
      {"name": "date", "type": "DATE", "required": true},
      {"name": "region", "type": "VARCHAR", "required": true},
      {"name": "total_sales", "type": "DECIMAL(12,2)", "required": true},
      {"name": "order_count", "type": "INTEGER", "required": true}
    ]
  },
  "sla": {
    "freshness": {"maxDelayMinutes": 60},
    "availability": {"uptimePercentage": 99.9},
    "quality": {"minScore": 0.95},
    "completeness": {"maxNullPercentage": 1.0}
  },
  "validFrom": "2026-01-01",
  "validUntil": "2027-01-01",
  "status": "ACTIVE"
}

Contract Enforcement

The governance service monitors active contracts and flags violations:

SLA Metric	Check Method	Alert Threshold
Freshness	Compare last update timestamp with SLA	> maxDelayMinutes
Availability	Track table accessibility over rolling window	< uptimePercentage
Quality	Integrate with Data Quality Service scores	< minScore
Completeness	Monitor null ratios from profiling	> maxNullPercentage
Schema	Compare current schema with contract schema	Any deviation

Access Review

The governance service supports periodic access reviews:

GET /v1/governance/access-reviews?status=PENDING

Response:
{
  "reviews": [
    {
      "id": "review-001",
      "type": "QUARTERLY_ACCESS_REVIEW",
      "reviewer": "data-steward@acme.com",
      "subject": "analyst-role",
      "resource": "analytics.public.customers",
      "currentAccess": ["TABLE_READ_DATA", "TABLE_LIST"],
      "lastAccessedAt": "2026-02-10T15:30:00Z",
      "accessFrequency": "12 queries in last 30 days",
      "recommendation": "MAINTAIN",
      "dueDate": "2026-02-28",
      "status": "PENDING"
    }
  ]
}

Compliance Engine

The compliance engine tracks regulatory requirements and maps them to governance controls:

Regulation	Controls
GDPR	Data subject access requests, right to erasure, consent tracking, data minimization
HIPAA	PHI access controls, audit logging, minimum necessary access, encryption verification
SOC 2	Access reviews, change management, incident response, monitoring
PCI DSS	Cardholder data masking, access restrictions, key management, audit trails

Retention Policies

{
  "policy": "gdpr-personal-data-retention",
  "classification": "RESTRICTED",
  "tags": ["PII"],
  "retentionPeriod": "730 days",
  "action": "DELETE",
  "exceptions": [
    {"condition": "legal_hold = true", "override": "RETAIN"}
  ],
  "notifyBeforeDays": 30
}

Audit Logging

Every governance decision is logged to an immutable audit trail:

{
  "event": "access_policy_evaluated",
  "timestamp": "2026-02-12T10:30:00Z",
  "tenantId": "tenant-123",
  "userId": "user-456",
  "resource": "analytics.public.orders",
  "action": "SELECT",
  "decision": "ALLOW_WITH_FILTER",
  "filter": "region = 'US-EAST'",
  "policies": ["region_restriction"],
  "evaluationTimeMs": 8,
  "sessionId": "session-789",
  "clientIp": "10.0.1.50"
}

Related Sections

Row-Level Security -- RLS filter injection in the Query Engine
Data Masking -- Masking rules driven by governance policies
Classification -- Classification tags used in policy evaluation
Data Quality -- Quality monitoring for data contracts
API Reference -- Governance Service endpoints

Data Quality Metadata Management