MATIH Platform is in active MVP development. Documentation reflects current implementation status.
10. Data Catalog & Governance
Data Lineage
Overview

Data Lineage

End-to-end lineage tracking from source to consumption across tables, columns, and pipelines.

Production

Overview

The MATIH Lineage subsystem tracks how data flows through your organization -- from raw source tables through transformations, aggregations, and downstream reports. It supports both table-level and column-level lineage, with automatic extraction from SQL queries, OpenLineage event ingestion, and manual edge creation.

Lineage is managed by two primary controllers within the Catalog Service:

ControllerBase PathPurpose
LineageController/v1/lineageEdge management, traversal, OpenLineage ingestion
ColumnLineageController/v1/catalog/lineage/columnColumn-level extraction and graph queries
LineageVisualizationController/api/v1/lineage/visualizationGraph rendering, impact analysis, export

Architecture

                        +---------------------------+
                        |   Lineage Visualization   |
                        |      Controller           |
                        |  /api/v1/lineage/viz      |
                        +------------+--------------+
                                     |
              +----------------------+----------------------+
              |                      |                      |
    +---------v--------+   +---------v--------+   +---------v--------+
    | LineageController|   | ColumnLineage    |   | Impact Analysis  |
    | /v1/lineage      |   | Controller       |   | Service          |
    |                  |   | /v1/catalog/     |   |                  |
    +--------+---------+   | lineage/column   |   +--------+---------+
             |             +--------+---------+            |
             |                      |                      |
    +--------v----------------------v----------------------v--------+
    |                      LineageService                           |
    |  - Edge CRUD, traversal, path finding, statistics             |
    +--------+-----------------------------------+---------+--------+
             |                                   |         |
    +--------v--------+              +-----------v-+   +---v--------+
    | LineageRepository|             | Column      |   | OpenLineage|
    | (PostgreSQL)     |             | Lineage Svc |   | Parser     |
    +------------------+             +-------------+   +------------+

Lineage Edge Model

Each lineage relationship is stored as an edge connecting a source entity to a target entity:

// LineageEdge entity structure
public class LineageEdge {
    private UUID id;
    private UUID tenantId;
    private UUID sourceEntityId;
    private String sourceEntityFqn;
    private String sourceEntityType;    // "table", "view", "pipeline"
    private UUID targetEntityId;
    private String targetEntityFqn;
    private String targetEntityType;
    private LineageType lineageType;     // DIRECT, INDIRECT, DERIVED
    private LineageSource lineageSource; // PIPELINE, QUERY_ANALYSIS, MANUAL, OPENMETADATA, INTEGRATION
    private UUID pipelineId;
    private UUID pipelineRunId;
    private String sqlQuery;
    private String description;
    private Double confidence;           // 0.0 to 1.0
    private boolean manual;
    private List<ColumnLineageMapping> columnLineage;
    private Map<String, Object> metadata;
    private List<String> tags;
    private String createdBy;
}

Lineage Types

TypeDescription
DIRECTData flows directly from source to target
INDIRECTData is derived through intermediate transformations
DERIVEDTarget is computed from source with business logic

Lineage Sources

SourceDescription
PIPELINEExtracted from Spark, Airflow, dbt, Flink pipelines
QUERY_ANALYSISDetected via SQL query parsing (Trino, Presto)
MANUALManually created by data stewards
OPENMETADATASynced from OpenMetadata
INTEGRATIONIngested via OpenLineage or other integrations

Section Contents

PageDescription
Upstream LineageTrace data sources (where data comes from)
Downstream LineageTrace data consumers (where data goes)
Full LineageComplete graph traversal and path finding
Column LineageColumn-level lineage extraction and tracing
Lineage VisualizationGraph rendering, impact analysis, export
Creating LineageEdge creation, OpenLineage ingestion, batch operations

Source Reference

ComponentFile
Edge management & traversalLineageController.java
Column lineage extractionColumnLineageController.java
Graph visualization & exportLineageVisualizationController.java
Core lineage serviceLineageService.java
Column lineage serviceColumnLineageService.java
Visualization serviceLineageVisualizationService.java
Impact analysisVisualizationImpactAnalysisService.java