OLAP Engines
The MATIH Platform uses ClickHouse and StarRocks as OLAP (Online Analytical Processing) engines for fast analytical queries on large datasets. These engines complement Trino by providing pre-aggregated, columnar-optimized storage for dashboard queries and real-time analytics.
OLAP Engine Comparison
| Aspect | ClickHouse | StarRocks |
|---|---|---|
| Storage format | Columnar (MergeTree engine family) | Columnar with intelligent indexing |
| Query protocol | HTTP, native TCP, JDBC | MySQL protocol, JDBC |
| Materialized views | Yes | Yes (auto-refresh) |
| Real-time ingestion | Kafka engine, direct INSERT | Routine Load from Kafka |
| Compression | LZ4, ZSTD | LZ4, ZSTD |
| Deployment | Kubernetes Operator or StatefulSet | Kubernetes StatefulSet |
ClickHouse
Role in the Platform
| Aspect | Details |
|---|---|
| Primary use | Pre-aggregated analytics, time-series data, event analytics |
| Access pattern | Trino ClickHouse connector, Flink sink |
| Multi-tenancy | tenant_id column in all tables, query-time filtering |
| Performance | Millions of rows per second on aggregation queries |
Table Design
All ClickHouse tables include a tenant_id column for multi-tenant isolation:
CREATE TABLE events (
tenant_id String,
event_type String,
timestamp DateTime,
user_id String,
payload String
) ENGINE = MergeTree()
ORDER BY (tenant_id, timestamp)
PARTITION BY toYYYYMM(timestamp)Data Ingestion
| Source | Method | Latency |
|---|---|---|
| Flink jobs | ClickHouse JDBC sink | Near real-time |
| Kafka | Kafka engine table | Seconds |
| Batch ETL | INSERT from SELECT | Minutes |
StarRocks
Role in the Platform
| Aspect | Details |
|---|---|
| Primary use | Real-time dashboard queries, materialized aggregations |
| Access pattern | JDBC/MySQL protocol from Query Engine |
| Multi-tenancy | tenant_id column in all tables |
| Materialized views | Auto-refresh for pre-computed aggregations |
Materialized Views
StarRocks materialized views pre-compute common aggregations:
CREATE MATERIALIZED VIEW revenue_by_region AS
SELECT
tenant_id,
region,
DATE_TRUNC('month', order_date) as month,
SUM(amount) as total_revenue,
COUNT(*) as order_count
FROM orders
GROUP BY tenant_id, region, DATE_TRUNC('month', order_date);Dashboard queries that match materialized view patterns are automatically routed to the pre-computed data, reducing latency from seconds to milliseconds.
When to Use Each Engine
| Workload | Recommended Engine | Rationale |
|---|---|---|
| Time-series aggregations | ClickHouse | MergeTree partitioning optimized for time-range queries |
| Real-time dashboards | StarRocks | Materialized views for sub-second response |
| Event analytics | ClickHouse | High ingest rate, columnar compression |
| Ad-hoc OLAP queries | Either | Both support fast analytical query patterns |
| Cross-source federation | Trino with OLAP connector | Trino joins OLAP data with other sources |
Integration with Trino
Trino connects to OLAP engines via dedicated connectors:
Trino Coordinator
|
+-- ClickHouse Connector --> ClickHouse cluster
| Query: SELECT region, SUM(amount) FROM clickhouse.sales.orders ...
|
+-- StarRocks Connector --> StarRocks cluster
Query: SELECT * FROM starrocks.analytics.revenue_by_region ...Related Pages
- Trino -- Federated SQL across OLAP engines
- Query Flow -- Query execution lifecycle
- Compute Engines -- All compute engines