Column-Level Lineage
Column-level lineage tracks data flow at the individual column granularity, enabling precise impact analysis and data provenance. The ColumnLineageController provides SQL-based extraction, batch processing from query logs, and graph queries.
Extract Column Lineage from SQL
Parse a SQL query to automatically extract column-level lineage mappings:
POST /v1/catalog/lineage/column/extractcurl -X POST "http://localhost:8086/v1/catalog/lineage/column/extract" \
-H "Content-Type: application/json" \
-H "X-Tenant-ID: 550e8400-e29b-41d4-a716-446655440000" \
-d '{
"sql": "INSERT INTO analytics.dim_orders SELECT o.order_id, o.amount * (1 - o.discount_pct) AS net_amount, c.name AS customer_name FROM raw.orders o JOIN raw.customers c ON o.customer_id = c.id",
"targetTable": "analytics.dim_orders",
"source": "SQL_PARSING",
"queryId": "q-001",
"executedBy": "550e8400-e29b-41d4-a716-446655440099"
}'Response (201 Created)
{
"success": true,
"lineagesCreated": 3,
"lineages": [
{
"sourceTable": "raw.orders",
"sourceColumn": "order_id",
"targetTable": "analytics.dim_orders",
"targetColumn": "order_id",
"transformation": "DIRECT_COPY",
"confidence": 1.0
},
{
"sourceTable": "raw.orders",
"sourceColumn": "amount",
"targetTable": "analytics.dim_orders",
"targetColumn": "net_amount",
"transformation": "amount * (1 - discount_pct)",
"confidence": 0.95
},
{
"sourceTable": "raw.customers",
"sourceColumn": "name",
"targetTable": "analytics.dim_orders",
"targetColumn": "customer_name",
"transformation": "RENAME",
"confidence": 1.0
}
],
"warnings": []
}Supported Lineage Sources
The source field accepts the following CatalogLineage.LineageSource values:
| Source | Description |
|---|---|
SQL_PARSING | Extracted by parsing SQL statements |
QUERY_LOG | Extracted from historical query logs |
MANUAL | Manually defined by a data steward |
PIPELINE | Extracted from pipeline metadata (Spark, dbt) |
OPENMETADATA | Synced from OpenMetadata lineage |
Batch Extract from Query Logs
Process multiple SQL queries from historical logs to build column lineage in bulk:
POST /v1/catalog/lineage/column/extract/batchcurl -X POST "http://localhost:8086/v1/catalog/lineage/column/extract/batch" \
-H "Content-Type: application/json" \
-H "X-Tenant-ID: 550e8400-e29b-41d4-a716-446655440000" \
-d '{
"queryLogs": [
{
"sql": "CREATE TABLE analytics.monthly_revenue AS SELECT DATE_TRUNC(month, order_date) AS month, SUM(amount) AS total_revenue FROM raw.orders GROUP BY 1",
"targetTable": "analytics.monthly_revenue",
"executedAt": "2026-02-10T14:00:00Z",
"executedBy": "550e8400-e29b-41d4-a716-446655440099"
},
{
"sql": "INSERT INTO reporting.customer_summary SELECT customer_id, COUNT(*) AS order_count, AVG(amount) AS avg_order FROM raw.orders GROUP BY customer_id",
"targetTable": "reporting.customer_summary",
"executedAt": "2026-02-10T15:00:00Z"
}
]
}'Response
{
"totalQueries": 2,
"successfulExtractions": 2,
"failedExtractions": 0,
"totalLineagesCreated": 5,
"results": [
{
"queryIndex": 0,
"success": true,
"lineagesCreated": 2,
"targetTable": "analytics.monthly_revenue"
},
{
"queryIndex": 1,
"success": true,
"lineagesCreated": 3,
"targetTable": "reporting.customer_summary"
}
],
"errors": []
}Get Column Lineage Graph
Retrieve a full column lineage graph for a specific column, tracing both upstream and downstream:
GET /v1/catalog/lineage/column/graph?tableFqn={fqn}&columnName={col}&upstreamDepth=3&downstreamDepth=3curl "http://localhost:8086/v1/catalog/lineage/column/graph?tableFqn=warehouse.analytics.dim_orders&columnName=net_amount&upstreamDepth=3&downstreamDepth=3" \
-H "X-Tenant-ID: 550e8400-e29b-41d4-a716-446655440000"Response
{
"rootTable": "warehouse.analytics.dim_orders",
"rootColumn": "net_amount",
"upstreamDepth": 3,
"downstreamDepth": 3,
"nodes": [
{
"tableFqn": "warehouse.raw.orders",
"columnName": "amount",
"direction": "UPSTREAM",
"depth": 1
},
{
"tableFqn": "warehouse.raw.orders",
"columnName": "discount_pct",
"direction": "UPSTREAM",
"depth": 1
},
{
"tableFqn": "warehouse.reporting.monthly_sales",
"columnName": "total_net_revenue",
"direction": "DOWNSTREAM",
"depth": 1,
"transformation": "SUM(net_amount)"
}
],
"edges": [
{
"sourceTable": "warehouse.raw.orders",
"sourceColumn": "amount",
"targetTable": "warehouse.analytics.dim_orders",
"targetColumn": "net_amount",
"transformation": "amount * (1 - discount_pct)"
},
{
"sourceTable": "warehouse.raw.orders",
"sourceColumn": "discount_pct",
"targetTable": "warehouse.analytics.dim_orders",
"targetColumn": "net_amount",
"transformation": "amount * (1 - discount_pct)"
},
{
"sourceTable": "warehouse.analytics.dim_orders",
"sourceColumn": "net_amount",
"targetTable": "warehouse.reporting.monthly_sales",
"targetColumn": "total_net_revenue",
"transformation": "SUM"
}
]
}Get Table Column Lineage
Retrieve all column lineage relationships for a table:
GET /v1/catalog/lineage/column/table/{tableFqn}curl "http://localhost:8086/v1/catalog/lineage/column/table/warehouse.analytics.dim_orders" \
-H "X-Tenant-ID: 550e8400-e29b-41d4-a716-446655440000"Search Column Lineage
Search for column lineage records matching specific patterns:
POST /v1/catalog/lineage/column/search?page=0&size=20curl -X POST "http://localhost:8086/v1/catalog/lineage/column/search?page=0&size=20" \
-H "Content-Type: application/json" \
-H "X-Tenant-ID: 550e8400-e29b-41d4-a716-446655440000" \
-d '{
"tableFqnPattern": "warehouse.analytics.%",
"columnNamePattern": "%revenue%",
"lineageSource": "SQL_PARSING"
}'Source Reference
| Component | File |
|---|---|
| Column lineage extraction | ColumnLineageController.java -- extractLineage() |
| Batch extraction | ColumnLineageController.java -- batchExtractLineage() |
| Column graph query | ColumnLineageController.java -- getColumnLineageGraph() |
| Table column lineage | ColumnLineageController.java -- getTableColumnLineage() |
| Search | ColumnLineageController.java -- searchColumnLineage() |
| Column lineage service | ColumnLineageService.java |