Metadata Ingestion
The Catalog Service supports both asynchronous and synchronous metadata ingestion from registered data sources. Ingestion discovers databases, tables, columns, and schema information, populating the catalog for search and governance.
Asynchronous Ingestion
Trigger a background ingestion job that runs without blocking the caller.
POST /api/v1/datasources/{id}/ingestcurl -X POST "http://localhost:8086/api/v1/datasources/ds-001/ingest" \
-H "X-Tenant-ID: 550e8400-e29b-41d4-a716-446655440000"Response (202 Accepted)
{
"dataSourceId": "ds-001",
"status": "STARTED",
"message": "Metadata ingestion started asynchronously"
}The async ingestion uses CompletableFuture for non-blocking execution:
CompletableFuture<MetadataIngestionService.IngestionResult> future =
ingestionService.ingestMetadataAsync(id);Synchronous Ingestion
For smaller data sources or when you need results immediately:
POST /api/v1/datasources/{id}/ingest/synccurl -X POST "http://localhost:8086/api/v1/datasources/ds-001/ingest/sync" \
-H "X-Tenant-ID: 550e8400-e29b-41d4-a716-446655440000"Response (200 OK)
{
"dataSourceId": "ds-001",
"databasesDiscovered": 3,
"tablesDiscovered": 145,
"columnsDiscovered": 2340,
"duration": "12.5s",
"status": "COMPLETED"
}Ingestion Pipeline
The MetadataIngestionService follows a multi-phase pipeline:
- Connection validation -- verify data source connectivity
- Database discovery -- enumerate all databases
- Table discovery -- enumerate tables per database
- Schema extraction -- extract column definitions, types, constraints
- Statistics collection -- row counts, data sizes, freshness
- Index update -- update Elasticsearch search index
- Event publication -- publish catalog change events to Kafka
OpenMetadata Integration
The Catalog Service integrates with OpenMetadata for metadata synchronization:
| Component | File |
|---|---|
| OpenMetadata client | OpenMetadataClient.java |
| Sync service | OpenMetadataSyncService.java |
| Sync scheduler | OpenMetadataSyncScheduler.java |
| Profiler integration | OpenMetadataProfilerService.java |
| Quality integration | OpenMetadataQualityService.java |
| Tag sync | OpenMetadataTagSyncService.java |
| Lineage sync | OpenMetadataLineageService.java |
| Webhook handler | OpenMetadataWebhookController.java |
Source Reference
| Component | File |
|---|---|
| Ingestion triggers | DataSourceController.java -- triggerIngestion(), triggerSyncIngestion() |
| Ingestion service | MetadataIngestionService.java |
| Catalog service | CatalogService.java |