Data Federation
Trino's federated query capability allows the MATIH platform to execute queries that join data across multiple data sources in a single SQL statement. This enables unified analytics without requiring data movement or ETL pipelines.
How Federation Works
SELECT
orders.order_id,
customers.name,
inventory.stock_level
FROM delta.public.orders AS orders
JOIN postgresql.crm.customers AS customers
ON orders.customer_id = customers.id
JOIN iceberg.warehouse.inventory AS inventory
ON orders.product_id = inventory.product_id
WHERE orders.date > DATE '2026-01-01'Trino resolves each table reference to its respective catalog and connector, executes partial queries on each data source, and joins the results in memory.
Federation Topology
Trino Coordinator
/ | \
/ | \
Delta Lake PostgreSQL Iceberg
(S3/MinIO) (External) (S3/MinIO)Supported Source Combinations
| Source A | Source B | Join Support | Performance Notes |
|---|---|---|---|
| Delta Lake | Delta Lake | Full | Best performance; pushdown optimized |
| Delta Lake | Iceberg | Full | Good performance; similar storage layer |
| Delta Lake | PostgreSQL | Full | Network transfer for PostgreSQL data |
| Any | Memory | Full | In-memory tables are fastest for lookup joins |
Query Optimization for Federation
Trino applies several optimizations for federated queries:
| Optimization | Description |
|---|---|
| Predicate pushdown | WHERE clauses pushed to source connectors |
| Projection pushdown | Only required columns fetched from sources |
| Dynamic filtering | Runtime filters from one source applied to another |
| Join reordering | Smaller tables processed first for broadcast joins |
Limitations
| Limitation | Description |
|---|---|
| Write federation | Cross-catalog INSERT/CREATE is not supported |
| Transaction scope | Transactions do not span multiple catalogs |
| Data freshness | External sources may have stale data |
| Network overhead | Remote sources add network latency |
Best Practices
- Place the largest table in the most performant catalog (Delta Lake or Iceberg)
- Use predicate filters to minimize data transfer from external sources
- Consider materialized views for frequently federated queries
- Monitor cross-catalog query performance through the analytics dashboard