MATIH Platform is in active MVP development. Documentation reflects current implementation status.
10. Data Catalog & Governance
Data Lineage
Upstream Lineage

Upstream Lineage

Upstream lineage traces the origins of a data entity -- revealing which tables, views, and pipelines contribute data to a given target. This is essential for understanding data provenance and debugging data quality issues.


Get Upstream Lineage

Retrieve all direct upstream edges for an entity via the LineageController:

GET /v1/lineage/entity/{entityId}/upstream?tenantId={tenantId}
curl "http://localhost:8086/v1/lineage/entity/550e8400-e29b-41d4-a716-446655440001/upstream?tenantId=550e8400-e29b-41d4-a716-446655440000" \
  -H "Authorization: Bearer $TOKEN"

Response

[
  {
    "id": "edge-001",
    "tenantId": "550e8400-...",
    "sourceEntityId": "tbl-raw-orders",
    "sourceEntityFqn": "warehouse.raw.orders",
    "sourceEntityType": "table",
    "targetEntityId": "tbl-dim-orders",
    "targetEntityFqn": "warehouse.analytics.dim_orders",
    "targetEntityType": "table",
    "lineageType": "DIRECT",
    "lineageSource": "PIPELINE",
    "pipelineId": "pipeline-etl-001",
    "confidence": 1.0,
    "description": "ETL pipeline: raw orders to dimension table",
    "createdBy": "airflow-integration"
  },
  {
    "id": "edge-002",
    "sourceEntityId": "tbl-raw-customers",
    "sourceEntityFqn": "warehouse.raw.customers",
    "sourceEntityType": "table",
    "targetEntityId": "tbl-dim-orders",
    "targetEntityFqn": "warehouse.analytics.dim_orders",
    "targetEntityType": "table",
    "lineageType": "DIRECT",
    "lineageSource": "QUERY_ANALYSIS",
    "sqlQuery": "INSERT INTO dim_orders SELECT o.*, c.name FROM raw.orders o JOIN raw.customers c ON o.customer_id = c.id",
    "confidence": 0.95
  }
]

Upstream Visualization Graph

For a richer upstream view with graph structure, use the visualization controller:

GET /api/v1/lineage/visualization/graph/{entityId}/upstream?maxDepth=5
curl "http://localhost:8086/api/v1/lineage/visualization/graph/550e8400-e29b-41d4-a716-446655440001/upstream?maxDepth=5" \
  -H "X-Tenant-ID: 550e8400-e29b-41d4-a716-446655440000" \
  -H "Authorization: Bearer $TOKEN"

Response

{
  "graph": {
    "nodes": [
      {
        "id": "tbl-raw-orders",
        "name": "raw.orders",
        "type": "TABLE",
        "depth": 1
      },
      {
        "id": "tbl-raw-customers",
        "name": "raw.customers",
        "type": "TABLE",
        "depth": 1
      },
      {
        "id": "tbl-source-crm",
        "name": "crm.contacts",
        "type": "TABLE",
        "depth": 2
      }
    ],
    "edges": [
      {
        "sourceId": "tbl-raw-orders",
        "targetId": "tbl-dim-orders",
        "lineageType": "DIRECT"
      },
      {
        "sourceId": "tbl-raw-customers",
        "targetId": "tbl-dim-orders",
        "lineageType": "DIRECT"
      },
      {
        "sourceId": "tbl-source-crm",
        "targetId": "tbl-raw-customers",
        "lineageType": "DIRECT"
      }
    ]
  },
  "metadata": {
    "entityId": "tbl-dim-orders",
    "direction": "UPSTREAM",
    "maxDepth": 5,
    "actualDepth": 2,
    "totalNodes": 3,
    "totalEdges": 3
  }
}

Upstream Column Lineage

Trace the upstream origin of a specific column:

GET /v1/catalog/lineage/column/upstream?tableFqn={fqn}&columnName={col}&depth=5
curl "http://localhost:8086/v1/catalog/lineage/column/upstream?tableFqn=warehouse.analytics.dim_orders&columnName=total_revenue&depth=5" \
  -H "X-Tenant-ID: 550e8400-e29b-41d4-a716-446655440000"

Response

{
  "rootTable": "warehouse.analytics.dim_orders",
  "rootColumn": "total_revenue",
  "nodes": [
    {
      "tableFqn": "warehouse.raw.orders",
      "columnName": "amount",
      "depth": 1,
      "transformation": "SUM(amount)"
    },
    {
      "tableFqn": "warehouse.raw.orders",
      "columnName": "discount",
      "depth": 1,
      "transformation": "SUM(amount) - SUM(discount)"
    }
  ],
  "edges": [
    {
      "sourceTable": "warehouse.raw.orders",
      "sourceColumn": "amount",
      "targetTable": "warehouse.analytics.dim_orders",
      "targetColumn": "total_revenue",
      "transformation": "SUM"
    }
  ]
}

Use Cases

  • Data quality root cause analysis -- When a metric looks wrong, trace upstream to find the source tables and transformation that produced the incorrect value
  • Regulatory compliance -- Prove the provenance of data used in financial reports or regulatory filings
  • Impact assessment -- Before modifying a source table schema, understand which downstream assets depend on it

Source Reference

ComponentFile
Upstream lineage endpointLineageController.java -- getUpstreamLineage()
Upstream visualizationLineageVisualizationController.java -- getUpstreamLineage()
Column upstream traceColumnLineageController.java -- traceUpstream()
Lineage serviceLineageService.java