MATIH Platform is in active MVP development. Documentation reflects current implementation status.
10. Data Catalog & Governance
Architecture

Catalog Service Architecture

Production - Core catalog APIs, search, lineage, classification

The Catalog Service is a Java 21 / Spring Boot 3.2 application that serves as the metadata backbone of the MATIH platform. It provides APIs for browsing, searching, and managing catalog entities across all tenant data sources.


Service Overview

PropertyValue
LanguageJava 21
FrameworkSpring Boot 3.2
Port8086
Namespacematih-data-plane
Base path/api/v1/catalog, /api/v1/datasources, /v1/lineage, /v1/classification
AuthenticationJWT via X-Tenant-ID header
Build toolGradle

High-Level Architecture

+-------------------------------------------------------------------+
|                       Catalog Service                              |
|                                                                    |
|  +------------------+  +------------------+  +-----------------+   |
|  | CatalogController|  | DataSourceCtrl   |  | DiscoveryCtrl   |   |
|  | /api/v1/catalog  |  | /api/v1/         |  | /api/v1/catalog |   |
|  |                  |  | datasources      |  | /discovery      |   |
|  +--------+---------+  +--------+---------+  +--------+--------+   |
|           |                      |                     |           |
|  +--------v----------------------v---------------------v--------+  |
|  |                    CatalogService                            |  |
|  |  - Database browsing    - Table management                   |  |
|  |  - Tag operations       - Statistics                         |  |
|  +----------------------------+---------------------------------+  |
|                               |                                    |
|  +----------------------------v---------------------------------+  |
|  |               Supporting Services                            |  |
|  |                                                              |  |
|  |  CatalogSearchService    MetadataIngestionService            |  |
|  |  ClassificationService   LineageService                      |  |
|  |  DataGlossaryService     GovernancePolicyService             |  |
|  +------+----------+----------+-----------+---------------------+  |
|         |          |          |           |                        |
|  +------v---+ +----v----+ +--v-------+ +-v-----------+            |
|  |PostgreSQL| |Elastic  | |OpenMeta  | |Kafka        |            |
|  |          | |Search   | |data      | |Events       |            |
|  +----------+ +---------+ +----------+ +-------------+            |
+-------------------------------------------------------------------+

Controller Overview

ControllerPathResponsibility
CatalogController/api/v1/catalogSearch, databases, tables, tags, lineage, statistics
DataSourceController/api/v1/datasourcesData source registration, CRUD, ingestion triggers
CatalogDiscoveryController/api/v1/catalog/discoveryTrending assets, related assets, browse hierarchy, recommendations
LineageVisualizationController/api/v1/lineage/visualizationGraph visualization, impact analysis, path finding, export
LineageController/v1/lineageEdge management, traversal, column lineage, OpenLineage
ColumnLineageController/v1/catalog/lineage/columnColumn-level lineage extraction and queries
ClassificationController/v1/classificationTable classification, PII/PHI/PCI discovery, rules

Entity Model

The Catalog Service manages five core entity types:

// CatalogDatabase - Represents a database in a data source
@Entity
public class CatalogDatabase {
    private UUID id;
    private UUID tenantId;
    private String name;
    private String fullyQualifiedName;
    private String description;
    private int tableCount;
    private UUID dataSourceId;
}
 
// CatalogTable - Represents a table with schema information
@Entity
public class CatalogTable {
    private UUID id;
    private UUID tenantId;
    private String name;
    private String fullyQualifiedName;
    private String schemaName;
    private String catalogName;
    private TableType tableType;
    private String description;
    private int columnCount;
    private List<String> tags;
}
 
// CatalogDataSource - Registered data source
@Entity
public class CatalogDataSource {
    private UUID id;
    private UUID tenantId;
    private String name;
    private String type;       // postgresql, mysql, snowflake, etc.
    private String connectionConfig;
    private boolean active;
}
 
// CatalogTag - Tag for asset organization
@Entity
public class CatalogTag {
    private UUID id;
    private UUID tenantId;
    private String name;
    private TagCategory category;  // BUSINESS, TECHNICAL, CLASSIFICATION, CUSTOM
}
 
// CatalogLineage - Lineage relationship between entities
@Entity
public class CatalogLineage {
    private UUID id;
    private UUID tenantId;
    private UUID sourceEntityId;
    private UUID targetEntityId;
    private String lineageType;
}

Configuration

The Catalog Service is configured through Spring Boot application properties:

# application.yml
server:
  port: 8086
 
spring:
  datasource:
    url: jdbc:postgresql://${DB_HOST}:5432/${DB_NAME}
    username: ${DB_USER}
    password: ${DB_PASSWORD}
  elasticsearch:
    uris: ${ELASTICSEARCH_URL:http://elasticsearch:9200}
  kafka:
    bootstrap-servers: ${KAFKA_BOOTSTRAP_SERVERS}
    producer:
      key-serializer: org.apache.kafka.common.serialization.StringSerializer
      value-serializer: org.springframework.kafka.support.serializer.JsonSerializer
 
openmetadata:
  url: ${OPENMETADATA_URL:http://openmetadata:8585}
  auth-token: ${OPENMETADATA_TOKEN}

Multi-Tenancy

All catalog operations are tenant-scoped. The X-Tenant-ID header is required on every request and ensures strict data isolation:

@GetMapping("/databases")
public ResponseEntity<Page<CatalogDatabase>> listDatabases(
        @RequestHeader("X-Tenant-ID") UUID tenantId,
        @RequestParam(defaultValue = "0") int page,
        @RequestParam(defaultValue = "20") int size) {
    Pageable pageable = PageRequest.of(page, size);
    Page<CatalogDatabase> databases = catalogService.listDatabases(tenantId, pageable);
    return ResponseEntity.ok(databases);
}

Next Steps