MATIH Platform is in active MVP development. Documentation reflects current implementation status.
10. Data Catalog & Governance
Metadata Ingestion

Metadata Ingestion

The Catalog Service supports both asynchronous and synchronous metadata ingestion from registered data sources. Ingestion discovers databases, tables, columns, and schema information, populating the catalog for search and governance.


Asynchronous Ingestion

Trigger a background ingestion job that runs without blocking the caller.

POST /api/v1/datasources/{id}/ingest
curl -X POST "http://localhost:8086/api/v1/datasources/ds-001/ingest" \
  -H "X-Tenant-ID: 550e8400-e29b-41d4-a716-446655440000"

Response (202 Accepted)

{
  "dataSourceId": "ds-001",
  "status": "STARTED",
  "message": "Metadata ingestion started asynchronously"
}

The async ingestion uses CompletableFuture for non-blocking execution:

CompletableFuture<MetadataIngestionService.IngestionResult> future =
    ingestionService.ingestMetadataAsync(id);

Synchronous Ingestion

For smaller data sources or when you need results immediately:

POST /api/v1/datasources/{id}/ingest/sync
curl -X POST "http://localhost:8086/api/v1/datasources/ds-001/ingest/sync" \
  -H "X-Tenant-ID: 550e8400-e29b-41d4-a716-446655440000"

Response (200 OK)

{
  "dataSourceId": "ds-001",
  "databasesDiscovered": 3,
  "tablesDiscovered": 145,
  "columnsDiscovered": 2340,
  "duration": "12.5s",
  "status": "COMPLETED"
}

Ingestion Pipeline

The MetadataIngestionService follows a multi-phase pipeline:

  1. Connection validation -- verify data source connectivity
  2. Database discovery -- enumerate all databases
  3. Table discovery -- enumerate tables per database
  4. Schema extraction -- extract column definitions, types, constraints
  5. Statistics collection -- row counts, data sizes, freshness
  6. Index update -- update Elasticsearch search index
  7. Event publication -- publish catalog change events to Kafka

OpenMetadata Integration

The Catalog Service integrates with OpenMetadata for metadata synchronization:

ComponentFile
OpenMetadata clientOpenMetadataClient.java
Sync serviceOpenMetadataSyncService.java
Sync schedulerOpenMetadataSyncScheduler.java
Profiler integrationOpenMetadataProfilerService.java
Quality integrationOpenMetadataQualityService.java
Tag syncOpenMetadataTagSyncService.java
Lineage syncOpenMetadataLineageService.java
Webhook handlerOpenMetadataWebhookController.java

Source Reference

ComponentFile
Ingestion triggersDataSourceController.java -- triggerIngestion(), triggerSyncIngestion()
Ingestion serviceMetadataIngestionService.java
Catalog serviceCatalogService.java