Chapter 9: Query Engine
The Query Engine is the central data access layer of the MATIH Data Plane. It receives SQL queries from the AI Service, the BI Service, and direct API consumers, routes them to the appropriate execution backend, enforces security policies, and returns results with full observability. This chapter provides a comprehensive examination of the Query Engine's architecture, its integration with Trino, the SQL execution lifecycle, optimization strategies, and the security mechanisms that protect tenant data.
What You Will Learn
By the end of this chapter, you will understand:
- The Query Engine architecture including its Java/Spring Boot 3.2 service design, strategy pattern for multi-engine support, and smart query routing
- Trino integration with Polaris REST Catalog, catalog configuration, connector management, and resource group isolation
- SQL execution flow from query submission through parsing, optimization, execution, and result delivery in both synchronous and asynchronous modes
- Multi-level caching with L1 in-memory (Caffeine), L2 distributed (Redis with GZIP compression), semantic caching, and adaptive policies
- Query analytics including performance trends, slow query detection, pattern recognition, and period comparison
- Export capabilities for CSV, JSON, JSON Lines, and Parquet formats with async support for large datasets
- Query optimization including cost estimation, materialized views, workload management, saved queries, and scheduling
- Row-Level Security (RLS) with tenant-aware filter injection, table extraction, WHERE clause manipulation, and policy evaluation
- Data masking for sensitive columns based on classification tags and user role hierarchies
- The complete API surface for query execution, management, scheduling, caching, analytics, and export
Chapter Structure
| Section | Description | Audience |
|---|---|---|
| Architecture | Service architecture, component layout, strategy pattern, smart query routing across Trino, ClickHouse, DuckDB, StarRocks, and Spark | Backend developers, architects |
| Query Execution | Synchronous and asynchronous execution, query lifecycle, cancellation, history, and statistics | Backend developers, data engineers |
| Caching | Multi-level cache architecture, semantic caching, adaptive policies, cache warming, analytics, and invalidation | Performance engineers, data engineers |
| Analytics | Query analytics, performance trends, slow query detection, pattern recognition, resource metrics, and period comparison | Data engineers, platform administrators |
| Export | Result export in CSV, JSON, JSON Lines, and Parquet formats with async support and progress tracking | Backend developers, data engineers |
| Optimization | Cost estimation, materialized views, workload management, saved queries, and query scheduling | Performance engineers, data engineers |
| Security | Row-level security, data masking, masking audit trails, and compliance | Security engineers, compliance officers |
| Trino Integration | Trino v458 configuration, Polaris REST Catalog, connector management, session properties, and resource groups | Data engineers, platform engineers |
| API Reference | Complete REST API documentation for all Query Engine endpoints | All developers |
Query Engine at a Glance
The Query Engine is a Java/Spring Boot 3.2 service that serves as the unified SQL execution gateway for the entire platform.
+------------------+
| API Consumers |
| (AI, BI, Direct) |
+--------+---------+
|
+--------v---------+
| Query Engine |
| (Port 8080) |
+--------+---------+
|
+--------+-----------+----------+---------+
| | | | |
+------v---+ +--v------+ +-v-------+ +v------+ +v----------+
| Trino | |ClickHouse| | DuckDB | |Star | | Spark |
| v458 | |(Real-time)| | (Local)| |Rocks | | (Async) |
+----------+ +----------+ +--------+ +-------+ +-----------+Key Numbers
| Metric | Value |
|---|---|
| Technology | Java 21, Spring Boot 3.2 |
| Service port | 8080 |
| Primary SQL engine | Trino v458 |
| Catalog integration | Polaris REST Catalog |
| Cache layers | 3 (L1 in-memory Caffeine, L2 Redis, semantic query plan) |
| Execution modes | Synchronous, asynchronous |
| Default sync timeout | 300 seconds |
| Default result limit | 10,000 rows |
| Maximum result limit | 100,000 rows |
| Max concurrent queries | 100 per tenant |
| Supported engines | Trino, ClickHouse, DuckDB, StarRocks, Spark Async |
| Security layers | RLS filter injection, data masking, tenant isolation |
| Export formats | CSV, JSON, JSON Lines, Parquet |
Key Source Files
The Query Engine implementation is organized under data-plane/query-engine/:
| Component | Location |
|---|---|
| Application entry point | src/main/java/com/matih/query/QueryEngineApplication.java |
| Controllers | |
| Query Controller | src/main/java/com/matih/query/controller/QueryController.java |
| Analytics Controller | src/main/java/com/matih/query/controller/AnalyticsController.java |
| Cache Controller | src/main/java/com/matih/query/controller/CacheController.java |
| Export Controller | src/main/java/com/matih/query/controller/QueryExportController.java |
| Cost Estimation Controller | src/main/java/com/matih/query/controller/CostEstimationController.java |
| Materialized View Controller | src/main/java/com/matih/query/controller/MaterializedViewController.java |
| Query Management Controller | src/main/java/com/matih/query/controller/QueryManagementController.java |
| Query Schedule Controller | src/main/java/com/matih/query/controller/QueryScheduleController.java |
| Saved Query Controller | src/main/java/com/matih/query/controller/SavedQueryController.java |
| Workload Controller | src/main/java/com/matih/query/controller/WorkloadController.java |
| Masking Audit Controller | src/main/java/com/matih/query/masking/controller/MaskingAuditController.java |
| Services | |
| Query Execution Service | src/main/java/com/matih/query/service/QueryExecutionService.java |
| Query Cache Service | src/main/java/com/matih/query/service/QueryCacheService.java |
| Query Analytics Service | src/main/java/com/matih/query/service/QueryAnalyticsService.java |
| Query Export Service | src/main/java/com/matih/query/export/QueryExportService.java |
| Query History Service | src/main/java/com/matih/query/service/QueryHistoryService.java |
| Engine Strategies | |
| Trino Strategy | src/main/java/com/matih/query/strategy/TrinoQueryStrategy.java |
| ClickHouse Strategy | src/main/java/com/matih/query/strategy/ClickHouseQueryStrategy.java |
| DuckDB Strategy | src/main/java/com/matih/query/strategy/DuckDBQueryStrategy.java |
| StarRocks Strategy | src/main/java/com/matih/query/strategy/StarRocksQueryStrategy.java |
| Spark Async Strategy | src/main/java/com/matih/query/strategy/SparkAsyncQueryStrategy.java |
| Routing | |
| Smart Query Router | src/main/java/com/matih/query/router/SmartQueryRouter.java |
| Cost-Based Router | src/main/java/com/matih/query/router/CostBasedRouter.java |
| Cache | |
| Multi-Level Cache | src/main/java/com/matih/query/cache/MultiLevelCacheService.java |
| Semantic Cache | src/main/java/com/matih/query/cache/SemanticCacheService.java |
| Adaptive Cache Policy | src/main/java/com/matih/query/cache/AdaptiveCachePolicy.java |
| Cache Warming Service | src/main/java/com/matih/query/cache/CacheWarmingService.java |
| Cache Analytics Service | src/main/java/com/matih/query/cache/CacheAnalyticsService.java |
| Cache Config | src/main/java/com/matih/query/cache/CacheConfig.java |
| Security | |
| RLS Filter Service | src/main/java/com/matih/query/rls/service/RlsFilterService.java |
| Masking Service | src/main/java/com/matih/query/masking/service/QueryResultMaskingService.java |
| Masking Audit Service | src/main/java/com/matih/query/masking/service/MaskingAuditService.java |
Design Principles
The Query Engine is built on several foundational principles:
-
Engine agnosticism. The strategy pattern abstracts execution backends. The same query interface works whether the query runs on Trino, ClickHouse, DuckDB, StarRocks, or Spark, allowing the smart router to select the optimal engine.
-
Security by default. Every query passes through RLS filter injection and data masking before reaching the execution engine. There is no bypass path for tenant-scoped queries.
-
Tenant isolation at every layer. Tenant context propagates from the HTTP header through the execution pipeline into Trino session properties, resource groups, cache partitions, and quota enforcement.
-
Cost awareness. Query cost estimation runs before execution to enforce budget policies, route expensive queries to asynchronous processing, and provide users with execution time estimates.
-
Multi-level caching. A three-tier cache (L1 Caffeine in-memory, L2 Redis with GZIP compression, semantic template matching) minimizes redundant computation while maintaining data freshness through dependency-based invalidation.
-
Observability as a first-class concern. Every query execution emits Micrometer metrics, structured log events, and OpenTelemetry trace spans covering the full lifecycle from submission to result delivery.
How This Chapter Connects
The Query Engine is consumed by several upstream services:
- The AI Service (Chapter 12) sends generated SQL through the Query Engine for execution against tenant data sources
- The BI Service (Chapter 16) routes dashboard queries and scheduled report queries through the Query Engine
- The Data Catalog (Chapter 10) provides metadata that informs query optimization and schema browsing
The Query Engine depends on several downstream systems:
- Trino v458 for distributed SQL execution across heterogeneous data sources
- Redis for L2 result caching, semantic cache index, and query deduplication
- PostgreSQL for query history, scheduled queries, saved queries, materialized view metadata, and execution metadata
- The Governance Service for RLS policy evaluation and masking rule retrieval
Begin with the Architecture section to understand the service's internal structure and multi-engine routing strategy, or jump directly to Query Execution for the end-to-end execution flow.