MATIH Platform is in active MVP development. Documentation reflects current implementation status.
2. Architecture
Data Stores
OLAP Engines

OLAP Engines

The MATIH Platform uses ClickHouse and StarRocks as OLAP (Online Analytical Processing) engines for fast analytical queries on large datasets. These engines complement Trino by providing pre-aggregated, columnar-optimized storage for dashboard queries and real-time analytics.


OLAP Engine Comparison

AspectClickHouseStarRocks
Storage formatColumnar (MergeTree engine family)Columnar with intelligent indexing
Query protocolHTTP, native TCP, JDBCMySQL protocol, JDBC
Materialized viewsYesYes (auto-refresh)
Real-time ingestionKafka engine, direct INSERTRoutine Load from Kafka
CompressionLZ4, ZSTDLZ4, ZSTD
DeploymentKubernetes Operator or StatefulSetKubernetes StatefulSet

ClickHouse

Role in the Platform

AspectDetails
Primary usePre-aggregated analytics, time-series data, event analytics
Access patternTrino ClickHouse connector, Flink sink
Multi-tenancytenant_id column in all tables, query-time filtering
PerformanceMillions of rows per second on aggregation queries

Table Design

All ClickHouse tables include a tenant_id column for multi-tenant isolation:

CREATE TABLE events (
    tenant_id String,
    event_type String,
    timestamp DateTime,
    user_id String,
    payload String
) ENGINE = MergeTree()
ORDER BY (tenant_id, timestamp)
PARTITION BY toYYYYMM(timestamp)

Data Ingestion

SourceMethodLatency
Flink jobsClickHouse JDBC sinkNear real-time
KafkaKafka engine tableSeconds
Batch ETLINSERT from SELECTMinutes

StarRocks

Role in the Platform

AspectDetails
Primary useReal-time dashboard queries, materialized aggregations
Access patternJDBC/MySQL protocol from Query Engine
Multi-tenancytenant_id column in all tables
Materialized viewsAuto-refresh for pre-computed aggregations

Materialized Views

StarRocks materialized views pre-compute common aggregations:

CREATE MATERIALIZED VIEW revenue_by_region AS
SELECT
    tenant_id,
    region,
    DATE_TRUNC('month', order_date) as month,
    SUM(amount) as total_revenue,
    COUNT(*) as order_count
FROM orders
GROUP BY tenant_id, region, DATE_TRUNC('month', order_date);

Dashboard queries that match materialized view patterns are automatically routed to the pre-computed data, reducing latency from seconds to milliseconds.


When to Use Each Engine

WorkloadRecommended EngineRationale
Time-series aggregationsClickHouseMergeTree partitioning optimized for time-range queries
Real-time dashboardsStarRocksMaterialized views for sub-second response
Event analyticsClickHouseHigh ingest rate, columnar compression
Ad-hoc OLAP queriesEitherBoth support fast analytical query patterns
Cross-source federationTrino with OLAP connectorTrino joins OLAP data with other sources

Integration with Trino

Trino connects to OLAP engines via dedicated connectors:

Trino Coordinator
  |
  +-- ClickHouse Connector --> ClickHouse cluster
  |     Query: SELECT region, SUM(amount) FROM clickhouse.sales.orders ...
  |
  +-- StarRocks Connector --> StarRocks cluster
        Query: SELECT * FROM starrocks.analytics.revenue_by_region ...

Related Pages