Chapter 9: Query Engine

The Query Engine is the central data access layer of the MATIH Data Plane. It receives SQL queries from the AI Service, the BI Service, and direct API consumers, routes them to the appropriate execution backend, enforces security policies, and returns results with full observability. This chapter provides a comprehensive examination of the Query Engine's architecture, its integration with Trino, the SQL execution lifecycle, optimization strategies, and the security mechanisms that protect tenant data.

What You Will Learn

By the end of this chapter, you will understand:

The Query Engine architecture including its Java/Spring Boot 3.2 service design, strategy pattern for multi-engine support, and smart query routing
Trino integration with Polaris REST Catalog, catalog configuration, connector management, and resource group isolation
SQL execution flow from query submission through parsing, optimization, execution, and result delivery in both synchronous and asynchronous modes
Multi-level caching with L1 in-memory (Caffeine), L2 distributed (Redis with GZIP compression), semantic caching, and adaptive policies
Query analytics including performance trends, slow query detection, pattern recognition, and period comparison
Export capabilities for CSV, JSON, JSON Lines, and Parquet formats with async support for large datasets
Query optimization including cost estimation, materialized views, workload management, saved queries, and scheduling
Row-Level Security (RLS) with tenant-aware filter injection, table extraction, WHERE clause manipulation, and policy evaluation
Data masking for sensitive columns based on classification tags and user role hierarchies
The complete API surface for query execution, management, scheduling, caching, analytics, and export

Chapter Structure

Section	Description	Audience
Architecture	Service architecture, component layout, strategy pattern, smart query routing across Trino, ClickHouse, DuckDB, StarRocks, and Spark	Backend developers, architects
Query Execution	Synchronous and asynchronous execution, query lifecycle, cancellation, history, and statistics	Backend developers, data engineers
Caching	Multi-level cache architecture, semantic caching, adaptive policies, cache warming, analytics, and invalidation	Performance engineers, data engineers
Analytics	Query analytics, performance trends, slow query detection, pattern recognition, resource metrics, and period comparison	Data engineers, platform administrators
Export	Result export in CSV, JSON, JSON Lines, and Parquet formats with async support and progress tracking	Backend developers, data engineers
Optimization	Cost estimation, materialized views, workload management, saved queries, and query scheduling	Performance engineers, data engineers
Security	Row-level security, data masking, masking audit trails, and compliance	Security engineers, compliance officers
Trino Integration	Trino v458 configuration, Polaris REST Catalog, connector management, session properties, and resource groups	Data engineers, platform engineers
API Reference	Complete REST API documentation for all Query Engine endpoints	All developers

Query Engine at a Glance

The Query Engine is a Java/Spring Boot 3.2 service that serves as the unified SQL execution gateway for the entire platform.

                      +------------------+
                      |   API Consumers  |
                      | (AI, BI, Direct) |
                      +--------+---------+
                               |
                      +--------v---------+
                      |  Query Engine    |
                      |  (Port 8080)     |
                      +--------+---------+
                               |
          +--------+-----------+----------+---------+
          |        |           |          |         |
   +------v---+ +--v------+ +-v-------+ +v------+ +v----------+
   | Trino    | |ClickHouse| | DuckDB | |Star   | | Spark     |
   | v458     | |(Real-time)| | (Local)| |Rocks  | | (Async)   |
   +----------+ +----------+ +--------+ +-------+ +-----------+

Key Numbers

Metric	Value
Technology	Java 21, Spring Boot 3.2
Service port	8080
Primary SQL engine	Trino v458
Catalog integration	Polaris REST Catalog
Cache layers	3 (L1 in-memory Caffeine, L2 Redis, semantic query plan)
Execution modes	Synchronous, asynchronous
Default sync timeout	300 seconds
Default result limit	10,000 rows
Maximum result limit	100,000 rows
Max concurrent queries	100 per tenant
Supported engines	Trino, ClickHouse, DuckDB, StarRocks, Spark Async
Security layers	RLS filter injection, data masking, tenant isolation
Export formats	CSV, JSON, JSON Lines, Parquet

Key Source Files

The Query Engine implementation is organized under data-plane/query-engine/:

Component	Location
Application entry point	`src/main/java/com/matih/query/QueryEngineApplication.java`
Controllers
Query Controller	`src/main/java/com/matih/query/controller/QueryController.java`
Analytics Controller	`src/main/java/com/matih/query/controller/AnalyticsController.java`
Cache Controller	`src/main/java/com/matih/query/controller/CacheController.java`
Export Controller	`src/main/java/com/matih/query/controller/QueryExportController.java`
Cost Estimation Controller	`src/main/java/com/matih/query/controller/CostEstimationController.java`
Materialized View Controller	`src/main/java/com/matih/query/controller/MaterializedViewController.java`
Query Management Controller	`src/main/java/com/matih/query/controller/QueryManagementController.java`
Query Schedule Controller	`src/main/java/com/matih/query/controller/QueryScheduleController.java`
Saved Query Controller	`src/main/java/com/matih/query/controller/SavedQueryController.java`
Workload Controller	`src/main/java/com/matih/query/controller/WorkloadController.java`
Masking Audit Controller	`src/main/java/com/matih/query/masking/controller/MaskingAuditController.java`
Services
Query Execution Service	`src/main/java/com/matih/query/service/QueryExecutionService.java`
Query Cache Service	`src/main/java/com/matih/query/service/QueryCacheService.java`
Query Analytics Service	`src/main/java/com/matih/query/service/QueryAnalyticsService.java`
Query Export Service	`src/main/java/com/matih/query/export/QueryExportService.java`
Query History Service	`src/main/java/com/matih/query/service/QueryHistoryService.java`
Engine Strategies
Trino Strategy	`src/main/java/com/matih/query/strategy/TrinoQueryStrategy.java`
ClickHouse Strategy	`src/main/java/com/matih/query/strategy/ClickHouseQueryStrategy.java`
DuckDB Strategy	`src/main/java/com/matih/query/strategy/DuckDBQueryStrategy.java`
StarRocks Strategy	`src/main/java/com/matih/query/strategy/StarRocksQueryStrategy.java`
Spark Async Strategy	`src/main/java/com/matih/query/strategy/SparkAsyncQueryStrategy.java`
Routing
Smart Query Router	`src/main/java/com/matih/query/router/SmartQueryRouter.java`
Cost-Based Router	`src/main/java/com/matih/query/router/CostBasedRouter.java`
Cache
Multi-Level Cache	`src/main/java/com/matih/query/cache/MultiLevelCacheService.java`
Semantic Cache	`src/main/java/com/matih/query/cache/SemanticCacheService.java`
Adaptive Cache Policy	`src/main/java/com/matih/query/cache/AdaptiveCachePolicy.java`
Cache Warming Service	`src/main/java/com/matih/query/cache/CacheWarmingService.java`
Cache Analytics Service	`src/main/java/com/matih/query/cache/CacheAnalyticsService.java`
Cache Config	`src/main/java/com/matih/query/cache/CacheConfig.java`
Security
RLS Filter Service	`src/main/java/com/matih/query/rls/service/RlsFilterService.java`
Masking Service	`src/main/java/com/matih/query/masking/service/QueryResultMaskingService.java`
Masking Audit Service	`src/main/java/com/matih/query/masking/service/MaskingAuditService.java`

Design Principles

The Query Engine is built on several foundational principles:

Engine agnosticism. The strategy pattern abstracts execution backends. The same query interface works whether the query runs on Trino, ClickHouse, DuckDB, StarRocks, or Spark, allowing the smart router to select the optimal engine.
Security by default. Every query passes through RLS filter injection and data masking before reaching the execution engine. There is no bypass path for tenant-scoped queries.
Tenant isolation at every layer. Tenant context propagates from the HTTP header through the execution pipeline into Trino session properties, resource groups, cache partitions, and quota enforcement.
Cost awareness. Query cost estimation runs before execution to enforce budget policies, route expensive queries to asynchronous processing, and provide users with execution time estimates.
Multi-level caching. A three-tier cache (L1 Caffeine in-memory, L2 Redis with GZIP compression, semantic template matching) minimizes redundant computation while maintaining data freshness through dependency-based invalidation.
Observability as a first-class concern. Every query execution emits Micrometer metrics, structured log events, and OpenTelemetry trace spans covering the full lifecycle from submission to result delivery.

How This Chapter Connects

The Query Engine is consumed by several upstream services:

The AI Service (Chapter 12) sends generated SQL through the Query Engine for execution against tenant data sources
The BI Service (Chapter 16) routes dashboard queries and scheduled report queries through the Query Engine
The Data Catalog (Chapter 10) provides metadata that informs query optimization and schema browsing

The Query Engine depends on several downstream systems:

Trino v458 for distributed SQL execution across heterogeneous data sources
Redis for L2 result caching, semantic cache index, and query deduplication
PostgreSQL for query history, scheduled queries, saved queries, materialized view metadata, and execution metadata
The Governance Service for RLS policy evaluation and masking rule retrieval

Begin with the Architecture section to understand the service's internal structure and multi-engine routing strategy, or jump directly to Query Execution for the end-to-end execution flow.

API Reference Engine Architecture