Trino
Trino is the primary query execution engine for the MATIH Platform. It provides federated SQL across multiple data sources, enabling queries that join data from Iceberg lakehouse tables, ClickHouse OLAP stores, and PostgreSQL metadata databases in a single SQL statement.
Role in the Platform
| Aspect | Details |
|---|---|
| Deployment | Kubernetes (coordinator + worker pods) |
| Access | JDBC from Query Engine service |
| Services using it | Query Engine, Semantic Layer, Data Quality |
| Multi-tenancy | Per-tenant catalog configuration |
Connector Configuration
| Connector | Data Source | Use Case |
|---|---|---|
| Iceberg | MinIO / S3 lakehouse | Primary data lake access for analytical queries |
| ClickHouse | ClickHouse cluster | Pre-aggregated OLAP queries |
| PostgreSQL | PostgreSQL databases | Metadata lookups, small-table joins |
| Hive | Hive Metastore | Legacy data warehouse access |
Query Federation
Trino federates queries across connectors:
-- Join data from Iceberg and PostgreSQL in a single query
SELECT c.customer_name, SUM(o.amount) as total_revenue
FROM iceberg.sales.orders o
JOIN postgresql.metadata.customers c
ON o.customer_id = c.id
WHERE o.order_date >= DATE '2025-10-01'
GROUP BY c.customer_name
ORDER BY total_revenue DESC
LIMIT 10Cluster Architecture
Trino Coordinator
| - Query parsing and planning
| - Worker task assignment
| - Result aggregation
|
+-- Worker 1: Execute scan and filter tasks
+-- Worker 2: Execute join and aggregate tasks
+-- Worker N: Scale horizontally for throughput| Component | Development | Production |
|---|---|---|
| Coordinator | 1 pod | 1 pod (HA optional) |
| Workers | 1 pod | 3-10 pods (auto-scaling) |
| Memory per worker | 2Gi | 8Gi |
| CPU per worker | 1 core | 4 cores |
Performance Targets
| Query Type | p95 Target |
|---|---|
| Simple SELECT (single source) | Less than 500ms |
| Multi-table join | Less than 2 seconds |
| Cross-connector join | Less than 5 seconds |
| Complex analytics (window functions) | Less than 30 seconds |
Related Pages
- Query Flow -- Query execution lifecycle
- OLAP Engines -- ClickHouse and StarRocks
- Compute Engines -- All compute engines