Data Federation

Trino's federated query capability allows the MATIH platform to execute queries that join data across multiple data sources in a single SQL statement. This enables unified analytics without requiring data movement or ETL pipelines.

How Federation Works

SELECT
  orders.order_id,
  customers.name,
  inventory.stock_level
FROM delta.public.orders AS orders
JOIN postgresql.crm.customers AS customers
  ON orders.customer_id = customers.id
JOIN iceberg.warehouse.inventory AS inventory
  ON orders.product_id = inventory.product_id
WHERE orders.date > DATE '2026-01-01'

Trino resolves each table reference to its respective catalog and connector, executes partial queries on each data source, and joins the results in memory.

Federation Topology

                    Trino Coordinator
                    /       |       \
                   /        |        \
           Delta Lake   PostgreSQL   Iceberg
           (S3/MinIO)   (External)   (S3/MinIO)

Supported Source Combinations

Source A	Source B	Join Support	Performance Notes
Delta Lake	Delta Lake	Full	Best performance; pushdown optimized
Delta Lake	Iceberg	Full	Good performance; similar storage layer
Delta Lake	PostgreSQL	Full	Network transfer for PostgreSQL data
Any	Memory	Full	In-memory tables are fastest for lookup joins

Query Optimization for Federation

Trino applies several optimizations for federated queries:

Optimization	Description
Predicate pushdown	WHERE clauses pushed to source connectors
Projection pushdown	Only required columns fetched from sources
Dynamic filtering	Runtime filters from one source applied to another
Join reordering	Smaller tables processed first for broadcast joins

Limitations

Limitation	Description
Write federation	Cross-catalog INSERT/CREATE is not supported
Transaction scope	Transactions do not span multiple catalogs
Data freshness	External sources may have stale data
Network overhead	Remote sources add network latency

Best Practices

Place the largest table in the most performant catalog (Delta Lake or Iceberg)
Use predicate filters to minimize data transfer from external sources
Consider materialized views for frequently federated queries
Monitor cross-catalog query performance through the analytics dashboard

Catalogs Connectors