MATIH Platform is in active MVP development. Documentation reflects current implementation status.
10. Data Catalog & Governance
Governance Service

Governance Service

The Governance Service is a Python/FastAPI application that enforces data access policies, manages data contracts, and ensures compliance with organizational governance requirements. It integrates with Apache Polaris for Iceberg table governance and the Open Policy Agent (OPA) for fine-grained policy evaluation.


Service Architecture

PropertyValue
LanguagePython 3.11
FrameworkFastAPI
Port8080
Namespacematih-data-plane
Policy engineOpen Policy Agent (OPA)
Catalog governanceApache Polaris

Component Layout

+------------------------------------------------------------------+
|                     Governance Service                             |
|                                                                   |
|  +-------------------+  +--------------------+  +---------------+ |
|  | FastAPI Routes     |  | Policy Engine      |  | Contract Mgr  | |
|  | - Access policies  |  | - OPA integration  |  | - Data        | |
|  | - RLS evaluation   |  | - Rego policies    |  |   contracts   | |
|  | - Masking rules    |  | - Policy versioning|  | - SLA rules   | |
|  | - Compliance       |  | - Audit logging    |  | - Violations  | |
|  +-------------------+  +--------------------+  +---------------+ |
|                                                                   |
|  +-------------------+  +--------------------+                    |
|  | Polaris Integration|  | Compliance Engine  |                   |
|  | - Namespace RBAC  |  | - Retention rules  |                   |
|  | - Table grants    |  | - Access reviews   |                   |
|  | - Credential vend |  | - Consent mgmt    |                   |
|  +-------------------+  +--------------------+                    |
+------------------------------------------------------------------+
         |                          |
+--------v---------+       +-------v--------+
|    OPA Sidecar   |       | Apache Polaris |
|  (Policy Engine) |       | (Iceberg Gov)  |
+------------------+       +----------------+

OPA Integration

The Governance Service uses the Open Policy Agent (OPA) as the primary policy evaluation engine. OPA evaluates policies written in the Rego language against structured input data.

Policy Evaluation Flow

1. Query Engine requests RLS evaluation
    |
2. Governance Service constructs OPA input
    |
3. OPA evaluates Rego policy against input
    |
4. OPA returns decision (allow/deny + filters)
    |
5. Governance Service returns filter to Query Engine

OPA Input Structure

{
  "input": {
    "user": {
      "id": "user-456",
      "email": "analyst@acme.com",
      "roles": ["analyst", "sales-team"],
      "attributes": {
        "department": "sales",
        "region": "US-EAST",
        "clearance_level": 2
      }
    },
    "resource": {
      "type": "TABLE",
      "fqn": "analytics.public.orders",
      "classification": "CONFIDENTIAL",
      "owner": "data-engineering-team",
      "tags": ["transactional", "finance"]
    },
    "action": "SELECT",
    "context": {
      "tenant_id": "tenant-123",
      "timestamp": "2026-02-12T10:30:00Z",
      "client_ip": "10.0.1.50",
      "session_id": "session-789"
    }
  }
}

Rego Policy Examples

Region-based RLS policy:

package matih.rls

# Default: no filter applied
default filter = {"requires_filtering": false}

# Apply region filter for non-admin users
filter = result {
    not is_admin
    user_region := input.user.attributes.region
    result := {
        "requires_filtering": true,
        "where_clause": sprintf("region = '%s'", [user_region]),
        "applied_policies": ["region_restriction"]
    }
}

is_admin {
    input.user.roles[_] == "admin"
}

Department-based data access policy:

package matih.access

# Default: deny access
default allow = false

# Allow access if user's department matches table's domain
allow {
    table_domain := input.resource.tags[_]
    user_dept := input.user.attributes.department
    table_domain == user_dept
}

# Admins can access everything
allow {
    input.user.roles[_] == "admin"
}

# Data stewards can access for governance purposes
allow {
    input.user.roles[_] == "data_steward"
    input.action == "SELECT"
}

Classification-based masking policy:

package matih.masking

# Determine masking level based on classification and user clearance
masking_rules[rule] {
    column := input.columns[_]
    column.classification == "RESTRICTED"
    input.user.attributes.clearance_level < 3
    rule := {
        "column": column.name,
        "strategy": "PARTIAL",
        "config": {"show_last": 4, "mask_char": "*"}
    }
}

masking_rules[rule] {
    column := input.columns[_]
    column.classification == "SECRET"
    input.user.attributes.clearance_level < 4
    rule := {
        "column": column.name,
        "strategy": "REDACT",
        "config": {"replacement": "***REDACTED***"}
    }
}

Policy Versioning

Policies are stored in a version-controlled repository and deployed to OPA via the bundle mechanism:

ComponentDescription
Policy repositoryGit repository containing Rego policies
Bundle serverHTTP server that packages policies into OPA bundles
OPA sidecarPulls bundles on a configurable interval (default: 60s)
Policy cacheOPA caches compiled policies in memory

Policy changes follow a deployment pipeline:

Edit Rego policy -> Unit test with OPA test -> Review -> Merge -> Bundle build -> OPA reload

Polaris Integration

Apache Polaris provides governance for Apache Iceberg tables, including namespace-level access control, table grants, and credential vending.

Polaris RBAC Model

Polaris Principal (mapped from MATIH user)
    |
    +-- PrincipalRole (mapped from MATIH role)
          |
          +-- CatalogRole (Polaris-specific)
                |
                +-- Grants on Namespace/Table
                      |
                      +-- Privileges: READ, WRITE, CREATE, DROP, MANAGE

Namespace Isolation

Each MATIH tenant maps to a Polaris namespace:

Polaris Catalog: matih
  |
  +-- Namespace: tenant_acme
  |     +-- Table: orders
  |     +-- Table: customers
  |
  +-- Namespace: tenant_globex
        +-- Table: orders
        +-- Table: inventory

Grant Management

POST /v1/governance/polaris/grants

Request:
{
  "principal": "analyst-role",
  "namespace": "tenant_acme",
  "table": "orders",
  "privileges": ["TABLE_READ_DATA", "TABLE_LIST"],
  "grantOption": false
}

Credential Vending

When a query requires access to Iceberg data files, the governance service coordinates with Polaris to vend scoped credentials:

1. Query Engine requests data access for table "tenant_acme.orders"
2. Governance Service evaluates access policy
3. If allowed, Polaris vends temporary S3/Azure credentials
4. Credentials are scoped to the specific table's data directory
5. Credentials expire after 1 hour

Data Contracts

The Governance Service manages data contracts that define expectations between data producers and consumers:

Contract Structure

{
  "id": "contract-sales-to-bi",
  "name": "Sales Data Contract for BI Service",
  "version": "2.1.0",
  "producer": {
    "team": "data-engineering",
    "service": "pipeline-service"
  },
  "consumer": {
    "team": "analytics",
    "service": "bi-service"
  },
  "dataset": "analytics.public.daily_sales",
  "schema": {
    "fields": [
      {"name": "date", "type": "DATE", "required": true},
      {"name": "region", "type": "VARCHAR", "required": true},
      {"name": "total_sales", "type": "DECIMAL(12,2)", "required": true},
      {"name": "order_count", "type": "INTEGER", "required": true}
    ]
  },
  "sla": {
    "freshness": {"maxDelayMinutes": 60},
    "availability": {"uptimePercentage": 99.9},
    "quality": {"minScore": 0.95},
    "completeness": {"maxNullPercentage": 1.0}
  },
  "validFrom": "2026-01-01",
  "validUntil": "2027-01-01",
  "status": "ACTIVE"
}

Contract Enforcement

The governance service monitors active contracts and flags violations:

SLA MetricCheck MethodAlert Threshold
FreshnessCompare last update timestamp with SLA> maxDelayMinutes
AvailabilityTrack table accessibility over rolling window< uptimePercentage
QualityIntegrate with Data Quality Service scores< minScore
CompletenessMonitor null ratios from profiling> maxNullPercentage
SchemaCompare current schema with contract schemaAny deviation

Access Review

The governance service supports periodic access reviews:

GET /v1/governance/access-reviews?status=PENDING

Response:
{
  "reviews": [
    {
      "id": "review-001",
      "type": "QUARTERLY_ACCESS_REVIEW",
      "reviewer": "data-steward@acme.com",
      "subject": "analyst-role",
      "resource": "analytics.public.customers",
      "currentAccess": ["TABLE_READ_DATA", "TABLE_LIST"],
      "lastAccessedAt": "2026-02-10T15:30:00Z",
      "accessFrequency": "12 queries in last 30 days",
      "recommendation": "MAINTAIN",
      "dueDate": "2026-02-28",
      "status": "PENDING"
    }
  ]
}

Compliance Engine

The compliance engine tracks regulatory requirements and maps them to governance controls:

RegulationControls
GDPRData subject access requests, right to erasure, consent tracking, data minimization
HIPAAPHI access controls, audit logging, minimum necessary access, encryption verification
SOC 2Access reviews, change management, incident response, monitoring
PCI DSSCardholder data masking, access restrictions, key management, audit trails

Retention Policies

{
  "policy": "gdpr-personal-data-retention",
  "classification": "RESTRICTED",
  "tags": ["PII"],
  "retentionPeriod": "730 days",
  "action": "DELETE",
  "exceptions": [
    {"condition": "legal_hold = true", "override": "RETAIN"}
  ],
  "notifyBeforeDays": 30
}

Audit Logging

Every governance decision is logged to an immutable audit trail:

{
  "event": "access_policy_evaluated",
  "timestamp": "2026-02-12T10:30:00Z",
  "tenantId": "tenant-123",
  "userId": "user-456",
  "resource": "analytics.public.orders",
  "action": "SELECT",
  "decision": "ALLOW_WITH_FILTER",
  "filter": "region = 'US-EAST'",
  "policies": ["region_restriction"],
  "evaluationTimeMs": 8,
  "sessionId": "session-789",
  "clientIp": "10.0.1.50"
}

Related Sections