MATIH Platform is in active MVP development. Documentation reflects current implementation status.
2. Architecture
Query Architecture

Query Engine Architecture

Production - Java/Spring Boot - Port 8080 - Trino federation

The Query Engine is the platform's SQL execution layer. It receives SQL queries (either user-written or AI-generated), validates them against the tenant's schema, executes them through Trino's federated query engine, and returns structured results with caching and pagination support.


2.4.A.1Query Processing Pipeline

Every query passes through a multi-stage pipeline:

SQL Input (from AI service, BI service, or direct API)
  |
  v
1. PARSE
   - Parse SQL into AST (Abstract Syntax Tree)
   - Validate syntax correctness
   |
   v
2. VALIDATE
   - Resolve table and column references against catalog
   - Check tenant has access to referenced tables
   - Validate column types for expressions
   |
   v
3. OPTIMIZE
   - Analyze query plan for optimization opportunities
   - Push down predicates where possible
   - Estimate query cost (rows, bytes, time)
   |
   v
4. AUTHORIZE
   - Check row-level security policies (governance-service)
   - Apply column masking rules for sensitive data
   - Verify user has query:execute permission
   |
   v
5. EXECUTE
   - Submit to Trino via JDBC connection
   - Set search_path / catalog to tenant's data sources
   - Stream results back with timeout enforcement
   |
   v
6. CACHE
   - Hash query + parameters for cache key
   - Store results in Redis with tenant-prefixed key
   - TTL based on query type (60s for dashboards, 0 for ad-hoc)
   |
   v
7. RETURN
   - Paginate results (default 1000 rows, max 10000)
   - Include query metadata (execution time, rows scanned)
   - Publish QUERY_COMPLETED event to Kafka

2.4.A.2Trino Integration

The Query Engine delegates execution to Trino, which provides federated query capabilities across multiple data sources:

Query Engine --> Trino Coordinator --> Trino Workers
                                          |
                    +---------------------+---------------------+
                    |                     |                     |
              PostgreSQL             S3/ADLS              Iceberg
              Connector              Connector             Connector
                    |                     |                     |
              Tenant metadata       Lakehouse data        Lakehouse data
              (schemas, config)     (Parquet, ORC)        (versioned tables)

Trino Catalog Configuration

Each tenant's data sources are mapped as Trino catalogs. The Query Engine sets the appropriate catalog before executing:

// Set tenant catalog context before query execution
Connection conn = trinoDataSource.getConnection();
conn.setCatalog(tenantId + "_catalog");
conn.setSchema(targetSchema);

Trino Resource Groups

Queries are assigned to Trino resource groups based on tenant tier:

TierMax Concurrent QueriesMax QueuedMemory LimitCPU Time Limit
Free5102GB60s
Professional20508GB300s
Enterprise100200CustomCustom

2.4.A.3Result Caching

Query results are cached in Redis with tenant-scoped keys:

Key format: {tenant_id}:query-engine:result:{query_hash}

Cache policy:
  - Dashboard queries: 60s TTL (refreshed on access)
  - Scheduled reports: 300s TTL
  - Ad-hoc queries: No caching (TTL = 0)
  - AI-generated queries: 30s TTL

Cache invalidation occurs when:

  1. The TTL expires
  2. A catalog metadata change event is received (table schema modified)
  3. A governance policy change affects the query's data sources
  4. Manual invalidation via API

Query Cost Estimation

Before execution, the Query Engine estimates query cost:

public QueryCostEstimate estimateCost(String sql, String tenantId) {
    // Uses Trino's EXPLAIN (TYPE DISTRIBUTED) to get:
    return QueryCostEstimate.builder()
        .estimatedRows(trinoEstimate.getOutputRowCount())
        .estimatedBytes(trinoEstimate.getOutputSizeInBytes())
        .estimatedTimeMs(trinoEstimate.getCpuCost())
        .dataSourcesAccessed(trinoEstimate.getConnectors())
        .build();
}

If the estimated cost exceeds the tenant's tier limit, the query is rejected with a 429 response and a suggestion to optimize.


2.4.A.4Key APIs

EndpointMethodDescription
/api/v1/query/executePOSTExecute a SQL query
/api/v1/query/validatePOSTValidate SQL without executing
/api/v1/query/explainPOSTReturn query execution plan
/api/v1/query/cancel/{id}POSTCancel a running query
/api/v1/query/historyGETQuery execution history for tenant
/api/v1/query/results/{id}GETRetrieve cached query results
/api/v1/query/costPOSTEstimate query cost

Related Sections