MATIH Platform is in active MVP development. Documentation reflects current implementation status.
2. Architecture
Query Flow

Query Flow

The query flow traces the execution of a SQL query from the BI Workbench or any service through the Query Engine and Trino. This is the most common data access pattern in the platform, used by dashboards, AI-generated queries, and ad-hoc analysis.


Query Execution Path

BI Workbench / AI Service
  |
  v
Query Engine (Port 8080)
  | 1. Validate SQL
  | 2. Apply tenant context
  | 3. Check query cache (Redis)
  |
  +-- Cache HIT --> Return cached result
  |
  +-- Cache MISS
  |     |
  |     v
  |   Trino
  |     | 4. Parse and plan query
  |     | 5. Route to connector(s)
  |     |
  |     +-- Iceberg (lakehouse tables)
  |     +-- ClickHouse (OLAP data)
  |     +-- PostgreSQL (metadata)
  |     |
  |     | 6. Execute distributed query
  |     | 7. Return result set
  |     |
  |     v
  |   Query Engine
  |     | 8. Cache result in Redis
  |     | 9. Publish audit event (Kafka)
  |     | 10. Publish billing event (Kafka)
  |
  v
Response (result set)

Semantic Query Path

When queries originate from the BI Service using semantic models, the Semantic Layer translates business terms to SQL:

BI Service
  |
  v
Semantic Layer (Port 8086)
  | 1. Resolve metric definition
  | 2. Apply dimension filters
  | 3. Generate SQL from MDL
  |
  v
Query Engine (Port 8080)
  | 4. Execute via Trino
  |
  v
Result set returned to BI Service

Query Types

TypeOriginTypical LatencyCache TTL
Dashboard widgetBI Service50-500ms (cached), 500-5000ms (uncached)5 minutes
AI-generated SQLAI Service50-5000msNot cached (unique queries)
Ad-hoc queryQuery API100-30000msConfigurable
Semantic querySemantic Layer100-2000ms5 minutes
Data quality checkData Quality Service200-5000msNot cached

Trino Federation

Trino federates queries across multiple data sources via connectors:

ConnectorData SourceQuery Pattern
IcebergMinIO / S3 lakehouseLarge analytical scans
ClickHouseClickHouse clusterPre-aggregated metrics
PostgreSQLPostgreSQL databasesMetadata and small lookups

A single query can join data across multiple connectors:

SELECT c.customer_name, SUM(o.amount)
FROM iceberg.sales.orders o
JOIN postgresql.metadata.customers c ON o.customer_id = c.id
GROUP BY c.customer_name
ORDER BY SUM(o.amount) DESC
LIMIT 10

Caching Strategy

Query results are cached in Redis with tenant-scoped keys:

Key FormatTTLInvalidation
{tenant_id}:query:{hash}5 minutesSchema change, explicit purge
{tenant_id}:semantic:{metric}:{hash}5 minutesModel update, explicit purge

Performance Characteristics

Query Complexityp50p95p99
Simple SELECT (single table, indexed)50ms200ms500ms
Join across 2-3 tables200ms1000ms3000ms
Aggregation with GROUP BY100ms500ms2000ms
Complex analytics (window functions)500ms5000ms30000ms
Cross-connector join1000ms5000ms15000ms

Related Pages