MATIH Platform is in active MVP development. Documentation reflects current implementation status.
9. Query Engine & SQL
Overview

Chapter 9: Query Engine

The Query Engine is the central data access layer of the MATIH Data Plane. It receives SQL queries from the AI Service, the BI Service, and direct API consumers, routes them to the appropriate execution backend, enforces security policies, and returns results with full observability. This chapter provides a comprehensive examination of the Query Engine's architecture, its integration with Trino, the SQL execution lifecycle, optimization strategies, and the security mechanisms that protect tenant data.


What You Will Learn

By the end of this chapter, you will understand:

  • The Query Engine architecture including its Java/Spring Boot 3.2 service design, strategy pattern for multi-engine support, and smart query routing
  • Trino integration with Polaris REST Catalog, catalog configuration, connector management, and resource group isolation
  • SQL execution flow from query submission through parsing, optimization, execution, and result delivery in both synchronous and asynchronous modes
  • Multi-level caching with L1 in-memory (Caffeine), L2 distributed (Redis with GZIP compression), semantic caching, and adaptive policies
  • Query analytics including performance trends, slow query detection, pattern recognition, and period comparison
  • Export capabilities for CSV, JSON, JSON Lines, and Parquet formats with async support for large datasets
  • Query optimization including cost estimation, materialized views, workload management, saved queries, and scheduling
  • Row-Level Security (RLS) with tenant-aware filter injection, table extraction, WHERE clause manipulation, and policy evaluation
  • Data masking for sensitive columns based on classification tags and user role hierarchies
  • The complete API surface for query execution, management, scheduling, caching, analytics, and export

Chapter Structure

SectionDescriptionAudience
ArchitectureService architecture, component layout, strategy pattern, smart query routing across Trino, ClickHouse, DuckDB, StarRocks, and SparkBackend developers, architects
Query ExecutionSynchronous and asynchronous execution, query lifecycle, cancellation, history, and statisticsBackend developers, data engineers
CachingMulti-level cache architecture, semantic caching, adaptive policies, cache warming, analytics, and invalidationPerformance engineers, data engineers
AnalyticsQuery analytics, performance trends, slow query detection, pattern recognition, resource metrics, and period comparisonData engineers, platform administrators
ExportResult export in CSV, JSON, JSON Lines, and Parquet formats with async support and progress trackingBackend developers, data engineers
OptimizationCost estimation, materialized views, workload management, saved queries, and query schedulingPerformance engineers, data engineers
SecurityRow-level security, data masking, masking audit trails, and complianceSecurity engineers, compliance officers
Trino IntegrationTrino v458 configuration, Polaris REST Catalog, connector management, session properties, and resource groupsData engineers, platform engineers
API ReferenceComplete REST API documentation for all Query Engine endpointsAll developers

Query Engine at a Glance

The Query Engine is a Java/Spring Boot 3.2 service that serves as the unified SQL execution gateway for the entire platform.

                      +------------------+
                      |   API Consumers  |
                      | (AI, BI, Direct) |
                      +--------+---------+
                               |
                      +--------v---------+
                      |  Query Engine    |
                      |  (Port 8080)     |
                      +--------+---------+
                               |
          +--------+-----------+----------+---------+
          |        |           |          |         |
   +------v---+ +--v------+ +-v-------+ +v------+ +v----------+
   | Trino    | |ClickHouse| | DuckDB | |Star   | | Spark     |
   | v458     | |(Real-time)| | (Local)| |Rocks  | | (Async)   |
   +----------+ +----------+ +--------+ +-------+ +-----------+

Key Numbers

MetricValue
TechnologyJava 21, Spring Boot 3.2
Service port8080
Primary SQL engineTrino v458
Catalog integrationPolaris REST Catalog
Cache layers3 (L1 in-memory Caffeine, L2 Redis, semantic query plan)
Execution modesSynchronous, asynchronous
Default sync timeout300 seconds
Default result limit10,000 rows
Maximum result limit100,000 rows
Max concurrent queries100 per tenant
Supported enginesTrino, ClickHouse, DuckDB, StarRocks, Spark Async
Security layersRLS filter injection, data masking, tenant isolation
Export formatsCSV, JSON, JSON Lines, Parquet

Key Source Files

The Query Engine implementation is organized under data-plane/query-engine/:

ComponentLocation
Application entry pointsrc/main/java/com/matih/query/QueryEngineApplication.java
Controllers
Query Controllersrc/main/java/com/matih/query/controller/QueryController.java
Analytics Controllersrc/main/java/com/matih/query/controller/AnalyticsController.java
Cache Controllersrc/main/java/com/matih/query/controller/CacheController.java
Export Controllersrc/main/java/com/matih/query/controller/QueryExportController.java
Cost Estimation Controllersrc/main/java/com/matih/query/controller/CostEstimationController.java
Materialized View Controllersrc/main/java/com/matih/query/controller/MaterializedViewController.java
Query Management Controllersrc/main/java/com/matih/query/controller/QueryManagementController.java
Query Schedule Controllersrc/main/java/com/matih/query/controller/QueryScheduleController.java
Saved Query Controllersrc/main/java/com/matih/query/controller/SavedQueryController.java
Workload Controllersrc/main/java/com/matih/query/controller/WorkloadController.java
Masking Audit Controllersrc/main/java/com/matih/query/masking/controller/MaskingAuditController.java
Services
Query Execution Servicesrc/main/java/com/matih/query/service/QueryExecutionService.java
Query Cache Servicesrc/main/java/com/matih/query/service/QueryCacheService.java
Query Analytics Servicesrc/main/java/com/matih/query/service/QueryAnalyticsService.java
Query Export Servicesrc/main/java/com/matih/query/export/QueryExportService.java
Query History Servicesrc/main/java/com/matih/query/service/QueryHistoryService.java
Engine Strategies
Trino Strategysrc/main/java/com/matih/query/strategy/TrinoQueryStrategy.java
ClickHouse Strategysrc/main/java/com/matih/query/strategy/ClickHouseQueryStrategy.java
DuckDB Strategysrc/main/java/com/matih/query/strategy/DuckDBQueryStrategy.java
StarRocks Strategysrc/main/java/com/matih/query/strategy/StarRocksQueryStrategy.java
Spark Async Strategysrc/main/java/com/matih/query/strategy/SparkAsyncQueryStrategy.java
Routing
Smart Query Routersrc/main/java/com/matih/query/router/SmartQueryRouter.java
Cost-Based Routersrc/main/java/com/matih/query/router/CostBasedRouter.java
Cache
Multi-Level Cachesrc/main/java/com/matih/query/cache/MultiLevelCacheService.java
Semantic Cachesrc/main/java/com/matih/query/cache/SemanticCacheService.java
Adaptive Cache Policysrc/main/java/com/matih/query/cache/AdaptiveCachePolicy.java
Cache Warming Servicesrc/main/java/com/matih/query/cache/CacheWarmingService.java
Cache Analytics Servicesrc/main/java/com/matih/query/cache/CacheAnalyticsService.java
Cache Configsrc/main/java/com/matih/query/cache/CacheConfig.java
Security
RLS Filter Servicesrc/main/java/com/matih/query/rls/service/RlsFilterService.java
Masking Servicesrc/main/java/com/matih/query/masking/service/QueryResultMaskingService.java
Masking Audit Servicesrc/main/java/com/matih/query/masking/service/MaskingAuditService.java

Design Principles

The Query Engine is built on several foundational principles:

  1. Engine agnosticism. The strategy pattern abstracts execution backends. The same query interface works whether the query runs on Trino, ClickHouse, DuckDB, StarRocks, or Spark, allowing the smart router to select the optimal engine.

  2. Security by default. Every query passes through RLS filter injection and data masking before reaching the execution engine. There is no bypass path for tenant-scoped queries.

  3. Tenant isolation at every layer. Tenant context propagates from the HTTP header through the execution pipeline into Trino session properties, resource groups, cache partitions, and quota enforcement.

  4. Cost awareness. Query cost estimation runs before execution to enforce budget policies, route expensive queries to asynchronous processing, and provide users with execution time estimates.

  5. Multi-level caching. A three-tier cache (L1 Caffeine in-memory, L2 Redis with GZIP compression, semantic template matching) minimizes redundant computation while maintaining data freshness through dependency-based invalidation.

  6. Observability as a first-class concern. Every query execution emits Micrometer metrics, structured log events, and OpenTelemetry trace spans covering the full lifecycle from submission to result delivery.


How This Chapter Connects

The Query Engine is consumed by several upstream services:

  • The AI Service (Chapter 12) sends generated SQL through the Query Engine for execution against tenant data sources
  • The BI Service (Chapter 16) routes dashboard queries and scheduled report queries through the Query Engine
  • The Data Catalog (Chapter 10) provides metadata that informs query optimization and schema browsing

The Query Engine depends on several downstream systems:

  • Trino v458 for distributed SQL execution across heterogeneous data sources
  • Redis for L2 result caching, semantic cache index, and query deduplication
  • PostgreSQL for query history, scheduled queries, saved queries, materialized view metadata, and execution metadata
  • The Governance Service for RLS policy evaluation and masking rule retrieval

Begin with the Architecture section to understand the service's internal structure and multi-engine routing strategy, or jump directly to Query Execution for the end-to-end execution flow.