Query Engine Architecture
The Query Engine is the platform's SQL execution layer. It receives SQL queries (either user-written or AI-generated), validates them against the tenant's schema, executes them through Trino's federated query engine, and returns structured results with caching and pagination support.
2.4.A.1Query Processing Pipeline
Every query passes through a multi-stage pipeline:
SQL Input (from AI service, BI service, or direct API)
|
v
1. PARSE
- Parse SQL into AST (Abstract Syntax Tree)
- Validate syntax correctness
|
v
2. VALIDATE
- Resolve table and column references against catalog
- Check tenant has access to referenced tables
- Validate column types for expressions
|
v
3. OPTIMIZE
- Analyze query plan for optimization opportunities
- Push down predicates where possible
- Estimate query cost (rows, bytes, time)
|
v
4. AUTHORIZE
- Check row-level security policies (governance-service)
- Apply column masking rules for sensitive data
- Verify user has query:execute permission
|
v
5. EXECUTE
- Submit to Trino via JDBC connection
- Set search_path / catalog to tenant's data sources
- Stream results back with timeout enforcement
|
v
6. CACHE
- Hash query + parameters for cache key
- Store results in Redis with tenant-prefixed key
- TTL based on query type (60s for dashboards, 0 for ad-hoc)
|
v
7. RETURN
- Paginate results (default 1000 rows, max 10000)
- Include query metadata (execution time, rows scanned)
- Publish QUERY_COMPLETED event to Kafka2.4.A.2Trino Integration
The Query Engine delegates execution to Trino, which provides federated query capabilities across multiple data sources:
Query Engine --> Trino Coordinator --> Trino Workers
|
+---------------------+---------------------+
| | |
PostgreSQL S3/ADLS Iceberg
Connector Connector Connector
| | |
Tenant metadata Lakehouse data Lakehouse data
(schemas, config) (Parquet, ORC) (versioned tables)Trino Catalog Configuration
Each tenant's data sources are mapped as Trino catalogs. The Query Engine sets the appropriate catalog before executing:
// Set tenant catalog context before query execution
Connection conn = trinoDataSource.getConnection();
conn.setCatalog(tenantId + "_catalog");
conn.setSchema(targetSchema);Trino Resource Groups
Queries are assigned to Trino resource groups based on tenant tier:
| Tier | Max Concurrent Queries | Max Queued | Memory Limit | CPU Time Limit |
|---|---|---|---|---|
| Free | 5 | 10 | 2GB | 60s |
| Professional | 20 | 50 | 8GB | 300s |
| Enterprise | 100 | 200 | Custom | Custom |
2.4.A.3Result Caching
Query results are cached in Redis with tenant-scoped keys:
Key format: {tenant_id}:query-engine:result:{query_hash}
Cache policy:
- Dashboard queries: 60s TTL (refreshed on access)
- Scheduled reports: 300s TTL
- Ad-hoc queries: No caching (TTL = 0)
- AI-generated queries: 30s TTLCache invalidation occurs when:
- The TTL expires
- A catalog metadata change event is received (table schema modified)
- A governance policy change affects the query's data sources
- Manual invalidation via API
Query Cost Estimation
Before execution, the Query Engine estimates query cost:
public QueryCostEstimate estimateCost(String sql, String tenantId) {
// Uses Trino's EXPLAIN (TYPE DISTRIBUTED) to get:
return QueryCostEstimate.builder()
.estimatedRows(trinoEstimate.getOutputRowCount())
.estimatedBytes(trinoEstimate.getOutputSizeInBytes())
.estimatedTimeMs(trinoEstimate.getCpuCost())
.dataSourcesAccessed(trinoEstimate.getConnectors())
.build();
}If the estimated cost exceeds the tenant's tier limit, the query is rejected with a 429 response and a suggestion to optimize.
2.4.A.4Key APIs
| Endpoint | Method | Description |
|---|---|---|
/api/v1/query/execute | POST | Execute a SQL query |
/api/v1/query/validate | POST | Validate SQL without executing |
/api/v1/query/explain | POST | Return query execution plan |
/api/v1/query/cancel/{id} | POST | Cancel a running query |
/api/v1/query/history | GET | Query execution history for tenant |
/api/v1/query/results/{id} | GET | Retrieve cached query results |
/api/v1/query/cost | POST | Estimate query cost |
Related Sections
- Trino Data Store -- Trino cluster configuration
- Query Flow -- End-to-end query lifecycle
- AI Architecture -- AI-generated queries