Query Flow
The query flow traces the execution of a SQL query from the BI Workbench or any service through the Query Engine and Trino. This is the most common data access pattern in the platform, used by dashboards, AI-generated queries, and ad-hoc analysis.
Query Execution Path
BI Workbench / AI Service
|
v
Query Engine (Port 8080)
| 1. Validate SQL
| 2. Apply tenant context
| 3. Check query cache (Redis)
|
+-- Cache HIT --> Return cached result
|
+-- Cache MISS
| |
| v
| Trino
| | 4. Parse and plan query
| | 5. Route to connector(s)
| |
| +-- Iceberg (lakehouse tables)
| +-- ClickHouse (OLAP data)
| +-- PostgreSQL (metadata)
| |
| | 6. Execute distributed query
| | 7. Return result set
| |
| v
| Query Engine
| | 8. Cache result in Redis
| | 9. Publish audit event (Kafka)
| | 10. Publish billing event (Kafka)
|
v
Response (result set)Semantic Query Path
When queries originate from the BI Service using semantic models, the Semantic Layer translates business terms to SQL:
BI Service
|
v
Semantic Layer (Port 8086)
| 1. Resolve metric definition
| 2. Apply dimension filters
| 3. Generate SQL from MDL
|
v
Query Engine (Port 8080)
| 4. Execute via Trino
|
v
Result set returned to BI ServiceQuery Types
| Type | Origin | Typical Latency | Cache TTL |
|---|---|---|---|
| Dashboard widget | BI Service | 50-500ms (cached), 500-5000ms (uncached) | 5 minutes |
| AI-generated SQL | AI Service | 50-5000ms | Not cached (unique queries) |
| Ad-hoc query | Query API | 100-30000ms | Configurable |
| Semantic query | Semantic Layer | 100-2000ms | 5 minutes |
| Data quality check | Data Quality Service | 200-5000ms | Not cached |
Trino Federation
Trino federates queries across multiple data sources via connectors:
| Connector | Data Source | Query Pattern |
|---|---|---|
| Iceberg | MinIO / S3 lakehouse | Large analytical scans |
| ClickHouse | ClickHouse cluster | Pre-aggregated metrics |
| PostgreSQL | PostgreSQL databases | Metadata and small lookups |
A single query can join data across multiple connectors:
SELECT c.customer_name, SUM(o.amount)
FROM iceberg.sales.orders o
JOIN postgresql.metadata.customers c ON o.customer_id = c.id
GROUP BY c.customer_name
ORDER BY SUM(o.amount) DESC
LIMIT 10Caching Strategy
Query results are cached in Redis with tenant-scoped keys:
| Key Format | TTL | Invalidation |
|---|---|---|
{tenant_id}:query:{hash} | 5 minutes | Schema change, explicit purge |
{tenant_id}:semantic:{metric}:{hash} | 5 minutes | Model update, explicit purge |
Performance Characteristics
| Query Complexity | p50 | p95 | p99 |
|---|---|---|---|
| Simple SELECT (single table, indexed) | 50ms | 200ms | 500ms |
| Join across 2-3 tables | 200ms | 1000ms | 3000ms |
| Aggregation with GROUP BY | 100ms | 500ms | 2000ms |
| Complex analytics (window functions) | 500ms | 5000ms | 30000ms |
| Cross-connector join | 1000ms | 5000ms | 15000ms |
Related Pages
- Browser to Gateway -- Initial request path
- Agent Flow -- AI-generated query flow
- Data Stores: Trino -- Trino architecture