Query Flow

The query flow traces the execution of a SQL query from the BI Workbench or any service through the Query Engine and Trino. This is the most common data access pattern in the platform, used by dashboards, AI-generated queries, and ad-hoc analysis.

Query Execution Path

BI Workbench / AI Service
  |
  v
Query Engine (Port 8080)
  | 1. Validate SQL
  | 2. Apply tenant context
  | 3. Check query cache (Redis)
  |
  +-- Cache HIT --> Return cached result
  |
  +-- Cache MISS
  |     |
  |     v
  |   Trino
  |     | 4. Parse and plan query
  |     | 5. Route to connector(s)
  |     |
  |     +-- Iceberg (lakehouse tables)
  |     +-- ClickHouse (OLAP data)
  |     +-- PostgreSQL (metadata)
  |     |
  |     | 6. Execute distributed query
  |     | 7. Return result set
  |     |
  |     v
  |   Query Engine
  |     | 8. Cache result in Redis
  |     | 9. Publish audit event (Kafka)
  |     | 10. Publish billing event (Kafka)
  |
  v
Response (result set)

Semantic Query Path

When queries originate from the BI Service using semantic models, the Semantic Layer translates business terms to SQL:

BI Service
  |
  v
Semantic Layer (Port 8086)
  | 1. Resolve metric definition
  | 2. Apply dimension filters
  | 3. Generate SQL from MDL
  |
  v
Query Engine (Port 8080)
  | 4. Execute via Trino
  |
  v
Result set returned to BI Service

Query Types

Type	Origin	Typical Latency	Cache TTL
Dashboard widget	BI Service	50-500ms (cached), 500-5000ms (uncached)	5 minutes
AI-generated SQL	AI Service	50-5000ms	Not cached (unique queries)
Ad-hoc query	Query API	100-30000ms	Configurable
Semantic query	Semantic Layer	100-2000ms	5 minutes
Data quality check	Data Quality Service	200-5000ms	Not cached

Trino Federation

Trino federates queries across multiple data sources via connectors:

Connector	Data Source	Query Pattern
Iceberg	MinIO / S3 lakehouse	Large analytical scans
ClickHouse	ClickHouse cluster	Pre-aggregated metrics
PostgreSQL	PostgreSQL databases	Metadata and small lookups

A single query can join data across multiple connectors:

SELECT c.customer_name, SUM(o.amount)
FROM iceberg.sales.orders o
JOIN postgresql.metadata.customers c ON o.customer_id = c.id
GROUP BY c.customer_name
ORDER BY SUM(o.amount) DESC
LIMIT 10

Caching Strategy

Query results are cached in Redis with tenant-scoped keys:

Key Format	TTL	Invalidation
`{tenant_id}:query:{hash}`	5 minutes	Schema change, explicit purge
`{tenant_id}:semantic:{metric}:{hash}`	5 minutes	Model update, explicit purge

Performance Characteristics

Query Complexity	p50	p95	p99
Simple SELECT (single table, indexed)	50ms	200ms	500ms
Join across 2-3 tables	200ms	1000ms	3000ms
Aggregation with GROUP BY	100ms	500ms	2000ms
Complex analytics (window functions)	500ms	5000ms	30000ms
Cross-connector join	1000ms	5000ms	15000ms

Browser to Gateway -- Initial request path
Agent Flow -- AI-generated query flow
Data Stores: Trino -- Trino architecture

Browser to Gateway Agent Flow