MATIH Platform is in active MVP development. Documentation reflects current implementation status.
10. Data Catalog & Governance
Data Lineage
Column-Level Lineage

Column-Level Lineage

Column-level lineage tracks data flow at the individual column granularity, enabling precise impact analysis and data provenance. The ColumnLineageController provides SQL-based extraction, batch processing from query logs, and graph queries.


Extract Column Lineage from SQL

Parse a SQL query to automatically extract column-level lineage mappings:

POST /v1/catalog/lineage/column/extract
curl -X POST "http://localhost:8086/v1/catalog/lineage/column/extract" \
  -H "Content-Type: application/json" \
  -H "X-Tenant-ID: 550e8400-e29b-41d4-a716-446655440000" \
  -d '{
    "sql": "INSERT INTO analytics.dim_orders SELECT o.order_id, o.amount * (1 - o.discount_pct) AS net_amount, c.name AS customer_name FROM raw.orders o JOIN raw.customers c ON o.customer_id = c.id",
    "targetTable": "analytics.dim_orders",
    "source": "SQL_PARSING",
    "queryId": "q-001",
    "executedBy": "550e8400-e29b-41d4-a716-446655440099"
  }'

Response (201 Created)

{
  "success": true,
  "lineagesCreated": 3,
  "lineages": [
    {
      "sourceTable": "raw.orders",
      "sourceColumn": "order_id",
      "targetTable": "analytics.dim_orders",
      "targetColumn": "order_id",
      "transformation": "DIRECT_COPY",
      "confidence": 1.0
    },
    {
      "sourceTable": "raw.orders",
      "sourceColumn": "amount",
      "targetTable": "analytics.dim_orders",
      "targetColumn": "net_amount",
      "transformation": "amount * (1 - discount_pct)",
      "confidence": 0.95
    },
    {
      "sourceTable": "raw.customers",
      "sourceColumn": "name",
      "targetTable": "analytics.dim_orders",
      "targetColumn": "customer_name",
      "transformation": "RENAME",
      "confidence": 1.0
    }
  ],
  "warnings": []
}

Supported Lineage Sources

The source field accepts the following CatalogLineage.LineageSource values:

SourceDescription
SQL_PARSINGExtracted by parsing SQL statements
QUERY_LOGExtracted from historical query logs
MANUALManually defined by a data steward
PIPELINEExtracted from pipeline metadata (Spark, dbt)
OPENMETADATASynced from OpenMetadata lineage

Batch Extract from Query Logs

Process multiple SQL queries from historical logs to build column lineage in bulk:

POST /v1/catalog/lineage/column/extract/batch
curl -X POST "http://localhost:8086/v1/catalog/lineage/column/extract/batch" \
  -H "Content-Type: application/json" \
  -H "X-Tenant-ID: 550e8400-e29b-41d4-a716-446655440000" \
  -d '{
    "queryLogs": [
      {
        "sql": "CREATE TABLE analytics.monthly_revenue AS SELECT DATE_TRUNC(month, order_date) AS month, SUM(amount) AS total_revenue FROM raw.orders GROUP BY 1",
        "targetTable": "analytics.monthly_revenue",
        "executedAt": "2026-02-10T14:00:00Z",
        "executedBy": "550e8400-e29b-41d4-a716-446655440099"
      },
      {
        "sql": "INSERT INTO reporting.customer_summary SELECT customer_id, COUNT(*) AS order_count, AVG(amount) AS avg_order FROM raw.orders GROUP BY customer_id",
        "targetTable": "reporting.customer_summary",
        "executedAt": "2026-02-10T15:00:00Z"
      }
    ]
  }'

Response

{
  "totalQueries": 2,
  "successfulExtractions": 2,
  "failedExtractions": 0,
  "totalLineagesCreated": 5,
  "results": [
    {
      "queryIndex": 0,
      "success": true,
      "lineagesCreated": 2,
      "targetTable": "analytics.monthly_revenue"
    },
    {
      "queryIndex": 1,
      "success": true,
      "lineagesCreated": 3,
      "targetTable": "reporting.customer_summary"
    }
  ],
  "errors": []
}

Get Column Lineage Graph

Retrieve a full column lineage graph for a specific column, tracing both upstream and downstream:

GET /v1/catalog/lineage/column/graph?tableFqn={fqn}&columnName={col}&upstreamDepth=3&downstreamDepth=3
curl "http://localhost:8086/v1/catalog/lineage/column/graph?tableFqn=warehouse.analytics.dim_orders&columnName=net_amount&upstreamDepth=3&downstreamDepth=3" \
  -H "X-Tenant-ID: 550e8400-e29b-41d4-a716-446655440000"

Response

{
  "rootTable": "warehouse.analytics.dim_orders",
  "rootColumn": "net_amount",
  "upstreamDepth": 3,
  "downstreamDepth": 3,
  "nodes": [
    {
      "tableFqn": "warehouse.raw.orders",
      "columnName": "amount",
      "direction": "UPSTREAM",
      "depth": 1
    },
    {
      "tableFqn": "warehouse.raw.orders",
      "columnName": "discount_pct",
      "direction": "UPSTREAM",
      "depth": 1
    },
    {
      "tableFqn": "warehouse.reporting.monthly_sales",
      "columnName": "total_net_revenue",
      "direction": "DOWNSTREAM",
      "depth": 1,
      "transformation": "SUM(net_amount)"
    }
  ],
  "edges": [
    {
      "sourceTable": "warehouse.raw.orders",
      "sourceColumn": "amount",
      "targetTable": "warehouse.analytics.dim_orders",
      "targetColumn": "net_amount",
      "transformation": "amount * (1 - discount_pct)"
    },
    {
      "sourceTable": "warehouse.raw.orders",
      "sourceColumn": "discount_pct",
      "targetTable": "warehouse.analytics.dim_orders",
      "targetColumn": "net_amount",
      "transformation": "amount * (1 - discount_pct)"
    },
    {
      "sourceTable": "warehouse.analytics.dim_orders",
      "sourceColumn": "net_amount",
      "targetTable": "warehouse.reporting.monthly_sales",
      "targetColumn": "total_net_revenue",
      "transformation": "SUM"
    }
  ]
}

Get Table Column Lineage

Retrieve all column lineage relationships for a table:

GET /v1/catalog/lineage/column/table/{tableFqn}
curl "http://localhost:8086/v1/catalog/lineage/column/table/warehouse.analytics.dim_orders" \
  -H "X-Tenant-ID: 550e8400-e29b-41d4-a716-446655440000"

Search Column Lineage

Search for column lineage records matching specific patterns:

POST /v1/catalog/lineage/column/search?page=0&size=20
curl -X POST "http://localhost:8086/v1/catalog/lineage/column/search?page=0&size=20" \
  -H "Content-Type: application/json" \
  -H "X-Tenant-ID: 550e8400-e29b-41d4-a716-446655440000" \
  -d '{
    "tableFqnPattern": "warehouse.analytics.%",
    "columnNamePattern": "%revenue%",
    "lineageSource": "SQL_PARSING"
  }'

Source Reference

ComponentFile
Column lineage extractionColumnLineageController.java -- extractLineage()
Batch extractionColumnLineageController.java -- batchExtractLineage()
Column graph queryColumnLineageController.java -- getColumnLineageGraph()
Table column lineageColumnLineageController.java -- getTableColumnLineage()
SearchColumnLineageController.java -- searchColumnLineage()
Column lineage serviceColumnLineageService.java