MATIH Platform is in active MVP development. Documentation reflects current implementation status.
10. Data Catalog & Governance
Overview
Part III: Data Platform Services

Chapter 10: Data Catalog & Governance

Comprehensive metadata management, data lineage tracking, governance policy enforcement, semantic modeling, and data classification across the MATIH platform.

Learning Objectives

  • Understand the Catalog Service architecture, metadata ingestion, and full-text search capabilities
  • Learn the Asset Service unified registry, versioning lifecycle, permission model, and governance workflows
  • Master data lineage tracking including column-level lineage, impact analysis, and OpenLineage integration
  • Learn governance policy management, ABAC, RLS, data masking, and compliance reporting
  • Build semantic models with dimensions, metrics, relationships, and natural language query translation

Details

Estimated Read Time: 90 minutes
Prerequisites:
  • Chapter 4: Installation & Configuration
  • Chapter 9: Query Engine
Related Chapters:
  • Ch. 9: Query Engine
  • Ch. 11: Pipelines & Data Engineering
  • Ch. 12: AI Service

The Data Catalog & Governance layer provides the metadata backbone of the MATIH platform. It encompasses four services that work together to catalog data assets, track data lineage, enforce governance policies, and define a semantic layer over raw data.


Service Architecture

API Layer
CatalogControllerDataSourceControllerAssetControllerVersionLifecycleControllerGovernanceControllerSemanticModelControllerLineageVisualizationController
Service Layer
CatalogServiceMetadataIngestionServiceAssetServiceVersionLifecycleServicePolicyServiceClassificationServiceSemanticModelServiceWrenAiService
Domain Layer
LineageServiceAssetPermissionServiceAssetCloneServiceAbacServiceRlsServiceDataMaskingServiceAdvancedMetricServiceQueryOptimizationService
Storage & Integration
PostgreSQLElasticsearchOpenMetadataOPAWrenAIKafkaRedis

Services Overview

ServiceTechnologyPortResponsibilities
Catalog ServiceJava 21, Spring Boot 3.28086Metadata discovery, search, tagging, lineage, classification, data sources
Asset ServiceJava 21, Spring Boot 3.28093Unified asset registry, versioning, lifecycle governance, permissions, cloning
Governance ServiceJava 21, Spring Boot 3.28080Policies, ABAC, RLS, masking, query audit, compliance, sensitive data
Semantic LayerJava 21, Spring Boot 3.28086Semantic models, metrics, dimensions, NL-to-query, query optimization
Data Quality ServicePython, FastAPI8000Validation rules, profiling, anomaly detection, quality scoring

Chapter Structure

Catalog Service

SectionDescription
ArchitectureService internals, OpenMetadata integration, Elasticsearch search
SearchFull-text search, autocomplete suggestions, search tracking
DatabasesDatabase listing, FQN lookup, data source filtering
TablesTable listing, schema introspection, column details, tag filtering
TagsTagging system, categories, tag-based asset discovery
Data SourcesData source registration, CRUD, configuration management
Metadata IngestionAsync and sync ingestion, OpenMetadata synchronization
StatisticsCatalog coverage metrics, asset counts, health indicators
API ReferenceComplete REST API for all catalog endpoints

Asset Service

SectionDescription
ArchitectureService internals, asset types, component layout, deployment
Version LifecycleState machine, approval workflow, governance policies
Permission ModelHierarchical RBAC, effective permissions, ownership transfer
CloningCross-asset cloning with provenance tracking
API EndpointsComplete REST API for assets, versions, lifecycle, permissions, clones
Prometheus MetricsCustom business counters for asset operations

Data Lineage

SectionDescription
OverviewLineage architecture, edge model, OpenLineage protocol
Upstream LineageSource dependency tracking, traversal depth control
Downstream LineageImpact analysis, consumer identification
Full LineageComplete graph construction, bidirectional traversal
Column-Level LineageSQL parsing, column mapping extraction, batch processing
Visualization & ExportGraph rendering, path finding, JSON/CSV/GraphML/DOT export
Creating LineageManual and automated lineage creation, OpenLineage ingestion

Governance

SectionDescription
OverviewGovernance architecture, policy engine, OPA integration
Policy ManagementPolicy CRUD, lifecycle (draft/active/suspended), rule evaluation
Data ClassificationManual and auto-classification, sensitivity levels, verification
Data MaskingMasking types, auto-mask by category, batch masking, detokenization
ABACAttribute-based access control, OPA Rego generation
Row-Level SecurityRLS policy definition, WHERE clause injection, audit logging
Query AuditExecution audit trail, slow/failed/anomalous query detection
Sensitive DataSensitive data access monitoring, PII/PHI/PCI discovery
ComplianceGDPR, HIPAA, PCI-DSS reporting, control mapping
API ReferenceComplete REST API for all governance endpoints

Semantic Layer

SectionDescription
ArchitectureSemantic layer design, WrenAI integration, MDL compiler
Semantic ModelsModel creation, dimensions, metrics, status lifecycle
Metric QueriesMetric query execution, preview, compiled SQL
Natural LanguageNL-to-semantic query translation, ask, explain, validate
Query OptimizationRewriting, caching, cost estimation, table statistics
Advanced MetricsCumulative, period comparison, moving average, percentile, CAGR
Metric VersioningVersion history, comparison, rollback
RelationshipsModel relationships, join paths, relationship types
API ReferenceComplete REST API for all semantic layer endpoints

Key Source Files

ComponentLocation
Catalog Controllerdata-plane/catalog-service/src/main/java/com/matih/catalog/controller/CatalogController.java
Data Source Controllerdata-plane/catalog-service/src/main/java/com/matih/catalog/controller/DataSourceController.java
Discovery Controllerdata-plane/catalog-service/src/main/java/com/matih/catalog/controller/CatalogDiscoveryController.java
Lineage Controllerdata-plane/catalog-service/src/main/java/com/matih/catalog/lineage/LineageController.java
Column Lineage Controllerdata-plane/catalog-service/src/main/java/com/matih/catalog/lineage/ColumnLineageController.java
Lineage Visualizationdata-plane/catalog-service/src/main/java/com/matih/catalog/controller/LineageVisualizationController.java
Classification Controllerdata-plane/catalog-service/src/main/java/com/matih/catalog/classification/ClassificationController.java
Asset Controllerdata-plane/asset-service/src/main/java/com/matih/asset/controller/AssetController.java
Version Lifecycle Controllerdata-plane/asset-service/src/main/java/com/matih/asset/controller/VersionLifecycleController.java
Asset Permission Controllerdata-plane/asset-service/src/main/java/com/matih/asset/controller/AssetPermissionController.java
Governance Controllerdata-plane/governance-service/src/main/java/com/matih/governance/controller/GovernanceController.java
ABAC Controllerdata-plane/governance-service/src/main/java/com/matih/governance/abac/controller/AbacController.java
RLS Controllerdata-plane/governance-service/src/main/java/com/matih/governance/rls/controller/RlsController.java
Query Audit Controllerdata-plane/governance-service/src/main/java/com/matih/governance/controller/QueryAuditController.java
Semantic Model Controllerdata-plane/semantic-layer/src/main/java/com/matih/semantic/controller/SemanticModelController.java
Advanced Metric Controllerdata-plane/semantic-layer/src/main/java/com/matih/semantic/controller/AdvancedMetricController.java
NL Controllerdata-plane/semantic-layer/src/main/java/com/matih/semantic/controller/NaturalLanguageController.java
Query Optimization Controllerdata-plane/semantic-layer/src/main/java/com/matih/semantic/controller/QueryOptimizationController.java

Design Principles

  1. Single source of truth for metadata. OpenMetadata serves as the canonical metadata store. All metadata changes flow through the Catalog Service's synchronization layer.

  2. Lineage as infrastructure. Data lineage is not an afterthought but a core capability that informs impact analysis, debugging, and governance decisions.

  3. Policy as code. Governance policies are defined programmatically and support lifecycle management (draft, review, active, suspended).

  4. Quality is continuous. Data quality is monitored in real time through validation rules, anomaly detection, and quality scoring.

  5. Classification drives security. Sensitivity classifications assigned in the catalog directly control data masking and access policies throughout the platform.

  6. Semantic abstraction. The Semantic Layer translates business language into SQL, enabling non-technical users to query data through natural language.


How This Chapter Connects

  • The Query Engine (Chapter 9) uses catalog metadata for query optimization, RLS policy evaluation, and data masking rules
  • The Asset Service stores versioned assets (queries, dashboards, pipelines, models) with lifecycle governance and cross-service permission control
  • The AI Service (Chapter 12) uses catalog metadata for text-to-SQL schema context and data understanding
  • The Pipeline Service (Chapter 11) publishes lineage events and consumes quality validation rules
  • The Semantic Layer provides the metrics definitions used by the BI Service for dashboard creation

Begin with the Catalog Service Architecture to understand the metadata backbone, then explore the Asset Service for the unified asset registry.