MATIH Platform is in active MVP development. Documentation reflects current implementation status.
10. Data Catalog & Governance
Semantic Layer
Query Optimization

Query Optimization

The Query Optimization system analyzes and transforms SQL generated by the Semantic Layer to improve execution performance. It includes cost-based optimization, SQL rewrite rules, query plan caching, and table statistics collection. All optimization operations are served at the base path /v1/optimization.


Optimization Pipeline

Semantic Query
    |
    v
SQL Compilation
    |
    v
Query Rewriter (Rule-Based)
    |
    v
Cost-Based Optimizer (Statistics)
    |
    v
Plan Cache Lookup
    |
    v
Optimized SQL

Optimization Endpoints

MethodPathDescription
POST/v1/optimization/optimizeOptimize a metric query
POST/v1/optimization/optimize-sqlOptimize raw SQL
POST/v1/optimization/analyzeAnalyze a query without optimizing
POST/v1/optimization/compareCompare two query plans
POST/v1/optimization/rewriteApply rewrite rules to SQL
GET/v1/optimization/rewrite-rulesList available rewrite rules

Optimize Metric Query

Compiles a metric query request and applies all optimization stages.

Request:

{
  "modelId": "550e8400-e29b-41d4-a716-446655440000",
  "query": {
    "metrics": ["total_revenue"],
    "dimensions": ["region"],
    "timeDimension": "order_date",
    "timeGrain": "MONTH"
  }
}

Response: An OptimizedQueryResult containing the original SQL, optimized SQL, applied optimizations, and cost estimates.


Query Analysis

The analyze endpoint provides cost estimates and recommendations without modifying the query.

Request:

{
  "modelId": "550e8400-e29b-41d4-a716-446655440000",
  "sql": "SELECT region, SUM(order_total) FROM orders GROUP BY region"
}

Response: A QueryAnalysis containing estimated rows, estimated bytes, complexity rating, and optimization recommendations.


Plan Comparison

Compare two SQL statements to determine which has better execution characteristics.

Request:

{
  "modelId": "550e8400-e29b-41d4-a716-446655440000",
  "sql1": "SELECT region, SUM(order_total) FROM orders GROUP BY region",
  "sql2": "SELECT region, SUM(order_total) FROM orders WHERE order_date >= DATE '2026-01-01' GROUP BY region"
}

Response: A PlanComparisonResult indicating which query is recommended and why.


SQL Rewrite Rules

The query rewriter applies transformation rules to improve SQL performance.

CapabilityDescription
Rule-based transformsPredefined SQL rewrite patterns
Selective applicationApply a specific named rule
All-rules applicationApply all applicable rules
Rule listingQuery available rewrite rules

Rewrite Request:

{
  "sql": "SELECT * FROM orders WHERE status = 'active'",
  "ruleName": "projection_pushdown"
}

If ruleName is omitted, all applicable rules are applied.


Query Plan Cache

The optimizer caches compiled query plans to avoid recompilation of frequently executed queries.

EndpointDescription
GET /v1/optimization/cache/statsCache statistics (hit rate, size, evictions)
GET /v1/optimization/cache/patternsFrequently used query patterns
DELETE /v1/optimization/cache/:modelIdInvalidate cache for a model
POST /v1/optimization/cache/warm-upPre-compile queries into the cache

Cache Warm-Up

Pre-compile a list of common queries to populate the cache before users execute them.

{
  "queries": [
    {
      "modelId": "550e8400-e29b-41d4-a716-446655440000",
      "query": {
        "metrics": ["total_revenue"],
        "dimensions": ["region"],
        "timeGrain": "MONTH"
      }
    }
  ]
}

Table Statistics

The optimizer uses table statistics for cost-based decisions. Statistics include row counts, data sizes, and column-level distribution information.

EndpointDescription
GET /v1/optimization/statistics/:modelIdGet cached statistics for a model
POST /v1/optimization/statistics/:modelId/refreshRefresh statistics from the data source
POST /v1/optimization/statistics/:modelId/analyzePerform detailed table analysis

Table Analysis Options

ParameterTypeDefaultDescription
tableNameStringRequiredTable to analyze
exactRowCountBooleanfalseUse exact count vs estimate
includeColumnStatsBooleantrueInclude per-column statistics
samplePercentageInteger10Percentage of rows to sample

Optimization Hints

The hints endpoint provides general optimization recommendations for a model.

EndpointDescription
GET /v1/optimization/hints/:modelIdGet optimization hints

Example hints:

  • Add time range filters to reduce data scanned
  • Use specific dimensions in GROUP BY instead of SELECT *
  • Consider pre-aggregating frequently used metrics
  • Use index-friendly filter patterns (equality before range)
  • Limit window function scope with PARTITION BY

Required Roles

OperationRequired Role
Optimize / Analyze / CompareAuthenticated user
Cache invalidationADMIN or SEMANTIC_ADMIN
Cache warm-upADMIN or SEMANTIC_ADMIN
Statistics refreshADMIN or SEMANTIC_ADMIN
Table analysisADMIN or SEMANTIC_ADMIN