Data Lineage
End-to-end lineage tracking from source to consumption across tables, columns, and pipelines.
Production
Overview
The MATIH Lineage subsystem tracks how data flows through your organization -- from raw source tables through transformations, aggregations, and downstream reports. It supports both table-level and column-level lineage, with automatic extraction from SQL queries, OpenLineage event ingestion, and manual edge creation.
Lineage is managed by two primary controllers within the Catalog Service:
| Controller | Base Path | Purpose |
|---|---|---|
LineageController | /v1/lineage | Edge management, traversal, OpenLineage ingestion |
ColumnLineageController | /v1/catalog/lineage/column | Column-level extraction and graph queries |
LineageVisualizationController | /api/v1/lineage/visualization | Graph rendering, impact analysis, export |
Architecture
+---------------------------+
| Lineage Visualization |
| Controller |
| /api/v1/lineage/viz |
+------------+--------------+
|
+----------------------+----------------------+
| | |
+---------v--------+ +---------v--------+ +---------v--------+
| LineageController| | ColumnLineage | | Impact Analysis |
| /v1/lineage | | Controller | | Service |
| | | /v1/catalog/ | | |
+--------+---------+ | lineage/column | +--------+---------+
| +--------+---------+ |
| | |
+--------v----------------------v----------------------v--------+
| LineageService |
| - Edge CRUD, traversal, path finding, statistics |
+--------+-----------------------------------+---------+--------+
| | |
+--------v--------+ +-----------v-+ +---v--------+
| LineageRepository| | Column | | OpenLineage|
| (PostgreSQL) | | Lineage Svc | | Parser |
+------------------+ +-------------+ +------------+Lineage Edge Model
Each lineage relationship is stored as an edge connecting a source entity to a target entity:
// LineageEdge entity structure
public class LineageEdge {
private UUID id;
private UUID tenantId;
private UUID sourceEntityId;
private String sourceEntityFqn;
private String sourceEntityType; // "table", "view", "pipeline"
private UUID targetEntityId;
private String targetEntityFqn;
private String targetEntityType;
private LineageType lineageType; // DIRECT, INDIRECT, DERIVED
private LineageSource lineageSource; // PIPELINE, QUERY_ANALYSIS, MANUAL, OPENMETADATA, INTEGRATION
private UUID pipelineId;
private UUID pipelineRunId;
private String sqlQuery;
private String description;
private Double confidence; // 0.0 to 1.0
private boolean manual;
private List<ColumnLineageMapping> columnLineage;
private Map<String, Object> metadata;
private List<String> tags;
private String createdBy;
}Lineage Types
| Type | Description |
|---|---|
DIRECT | Data flows directly from source to target |
INDIRECT | Data is derived through intermediate transformations |
DERIVED | Target is computed from source with business logic |
Lineage Sources
| Source | Description |
|---|---|
PIPELINE | Extracted from Spark, Airflow, dbt, Flink pipelines |
QUERY_ANALYSIS | Detected via SQL query parsing (Trino, Presto) |
MANUAL | Manually created by data stewards |
OPENMETADATA | Synced from OpenMetadata |
INTEGRATION | Ingested via OpenLineage or other integrations |
Section Contents
| Page | Description |
|---|---|
| Upstream Lineage | Trace data sources (where data comes from) |
| Downstream Lineage | Trace data consumers (where data goes) |
| Full Lineage | Complete graph traversal and path finding |
| Column Lineage | Column-level lineage extraction and tracing |
| Lineage Visualization | Graph rendering, impact analysis, export |
| Creating Lineage | Edge creation, OpenLineage ingestion, batch operations |
Source Reference
| Component | File |
|---|---|
| Edge management & traversal | LineageController.java |
| Column lineage extraction | ColumnLineageController.java |
| Graph visualization & export | LineageVisualizationController.java |
| Core lineage service | LineageService.java |
| Column lineage service | ColumnLineageService.java |
| Visualization service | LineageVisualizationService.java |
| Impact analysis | VisualizationImpactAnalysisService.java |