MATIH Platform is in active MVP development. Documentation reflects current implementation status.
2. Architecture
Data Stores
Trino

Trino

Trino is the primary query execution engine for the MATIH Platform. It provides federated SQL across multiple data sources, enabling queries that join data from Iceberg lakehouse tables, ClickHouse OLAP stores, and PostgreSQL metadata databases in a single SQL statement.


Role in the Platform

AspectDetails
DeploymentKubernetes (coordinator + worker pods)
AccessJDBC from Query Engine service
Services using itQuery Engine, Semantic Layer, Data Quality
Multi-tenancyPer-tenant catalog configuration

Connector Configuration

ConnectorData SourceUse Case
IcebergMinIO / S3 lakehousePrimary data lake access for analytical queries
ClickHouseClickHouse clusterPre-aggregated OLAP queries
PostgreSQLPostgreSQL databasesMetadata lookups, small-table joins
HiveHive MetastoreLegacy data warehouse access

Query Federation

Trino federates queries across connectors:

-- Join data from Iceberg and PostgreSQL in a single query
SELECT c.customer_name, SUM(o.amount) as total_revenue
FROM iceberg.sales.orders o
JOIN postgresql.metadata.customers c
  ON o.customer_id = c.id
WHERE o.order_date >= DATE '2025-10-01'
GROUP BY c.customer_name
ORDER BY total_revenue DESC
LIMIT 10

Cluster Architecture

Trino Coordinator
  | - Query parsing and planning
  | - Worker task assignment
  | - Result aggregation
  |
  +-- Worker 1: Execute scan and filter tasks
  +-- Worker 2: Execute join and aggregate tasks
  +-- Worker N: Scale horizontally for throughput
ComponentDevelopmentProduction
Coordinator1 pod1 pod (HA optional)
Workers1 pod3-10 pods (auto-scaling)
Memory per worker2Gi8Gi
CPU per worker1 core4 cores

Performance Targets

Query Typep95 Target
Simple SELECT (single source)Less than 500ms
Multi-table joinLess than 2 seconds
Cross-connector joinLess than 5 seconds
Complex analytics (window functions)Less than 30 seconds

Related Pages