Query Engine Architecture

Production - Java/Spring Boot - Port 8080 - Trino federation

The Query Engine is the platform's SQL execution layer. It receives SQL queries (either user-written or AI-generated), validates them against the tenant's schema, executes them through Trino's federated query engine, and returns structured results with caching and pagination support.

2.4.A.1Query Processing Pipeline

Every query passes through a multi-stage pipeline:

SQL Input (from AI service, BI service, or direct API)
  |
  v
1. PARSE
   - Parse SQL into AST (Abstract Syntax Tree)
   - Validate syntax correctness
   |
   v
2. VALIDATE
   - Resolve table and column references against catalog
   - Check tenant has access to referenced tables
   - Validate column types for expressions
   |
   v
3. OPTIMIZE
   - Analyze query plan for optimization opportunities
   - Push down predicates where possible
   - Estimate query cost (rows, bytes, time)
   |
   v
4. AUTHORIZE
   - Check row-level security policies (governance-service)
   - Apply column masking rules for sensitive data
   - Verify user has query:execute permission
   |
   v
5. EXECUTE
   - Submit to Trino via JDBC connection
   - Set search_path / catalog to tenant's data sources
   - Stream results back with timeout enforcement
   |
   v
6. CACHE
   - Hash query + parameters for cache key
   - Store results in Redis with tenant-prefixed key
   - TTL based on query type (60s for dashboards, 0 for ad-hoc)
   |
   v
7. RETURN
   - Paginate results (default 1000 rows, max 10000)
   - Include query metadata (execution time, rows scanned)
   - Publish QUERY_COMPLETED event to Kafka

2.4.A.2Trino Integration

The Query Engine delegates execution to Trino, which provides federated query capabilities across multiple data sources:

Query Engine --> Trino Coordinator --> Trino Workers
                                          |
                    +---------------------+---------------------+
                    |                     |                     |
              PostgreSQL             S3/ADLS              Iceberg
              Connector              Connector             Connector
                    |                     |                     |
              Tenant metadata       Lakehouse data        Lakehouse data
              (schemas, config)     (Parquet, ORC)        (versioned tables)

Trino Catalog Configuration

Each tenant's data sources are mapped as Trino catalogs. The Query Engine sets the appropriate catalog before executing:

// Set tenant catalog context before query execution
Connection conn = trinoDataSource.getConnection();
conn.setCatalog(tenantId + "_catalog");
conn.setSchema(targetSchema);

Trino Resource Groups

Queries are assigned to Trino resource groups based on tenant tier:

Tier	Max Concurrent Queries	Max Queued	Memory Limit	CPU Time Limit
Free	5	10	2GB	60s
Professional	20	50	8GB	300s
Enterprise	100	200	Custom	Custom

2.4.A.3Result Caching

Query results are cached in Redis with tenant-scoped keys:

Key format: {tenant_id}:query-engine:result:{query_hash}

Cache policy:
  - Dashboard queries: 60s TTL (refreshed on access)
  - Scheduled reports: 300s TTL
  - Ad-hoc queries: No caching (TTL = 0)
  - AI-generated queries: 30s TTL

Cache invalidation occurs when:

The TTL expires
A catalog metadata change event is received (table schema modified)
A governance policy change affects the query's data sources
Manual invalidation via API

Query Cost Estimation

Before execution, the Query Engine estimates query cost:

public QueryCostEstimate estimateCost(String sql, String tenantId) {
    // Uses Trino's EXPLAIN (TYPE DISTRIBUTED) to get:
    return QueryCostEstimate.builder()
        .estimatedRows(trinoEstimate.getOutputRowCount())
        .estimatedBytes(trinoEstimate.getOutputSizeInBytes())
        .estimatedTimeMs(trinoEstimate.getCpuCost())
        .dataSourcesAccessed(trinoEstimate.getConnectors())
        .build();
}

If the estimated cost exceeds the tenant's tier limit, the query is rejected with a 429 response and a suggestion to optimize.

2.4.A.4Key APIs

Endpoint	Method	Description
`/api/v1/query/execute`	POST	Execute a SQL query
`/api/v1/query/validate`	POST	Validate SQL without executing
`/api/v1/query/explain`	POST	Return query execution plan
`/api/v1/query/cancel/{id}`	POST	Cancel a running query
`/api/v1/query/history`	GET	Query execution history for tenant
`/api/v1/query/results/{id}`	GET	Retrieve cached query results
`/api/v1/query/cost`	POST	Estimate query cost

Related Sections

Trino Data Store -- Trino cluster configuration
Query Flow -- End-to-end query lifecycle
AI Architecture -- AI-generated queries

Overview AI Architecture