MATIH Platform is in active MVP development. Documentation reflects current implementation status.
9. Query Engine & SQL
Cost Estimation

Cost Estimation

The CostEstimationController provides pre-execution cost prediction, engine comparison, and threshold checking. Cost estimation helps users understand the resource impact of their queries before execution.


Estimate Query Cost

POST /v1/cost/estimate
curl -X POST http://query-engine:8080/v1/cost/estimate \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $JWT_TOKEN" \
  -d '{
    "query": "SELECT o.*, c.name FROM orders o JOIN customers c ON o.customer_id = c.id WHERE o.year = 2025",
    "preferredEngine": "TRINO"
  }'
{
  "query": "SELECT o.*, c.name FROM orders o JOIN customers c ON ...",
  "costCategory": "MEDIUM",
  "estimatedCostUsd": 0.045,
  "resourceCosts": {
    "estimatedBytesScanned": 2147483648,
    "estimatedRowsProcessed": 5000000,
    "estimatedBytesOutput": 1048576,
    "estimatedCpuSeconds": 45,
    "estimatedMemoryMb": 512,
    "estimatedWorkers": 4,
    "estimatedNetworkBytes": 536870912
  },
  "executionTime": {
    "minTime": "PT2S",
    "expectedTime": "PT8S",
    "maxTime": "PT30S",
    "p50Time": "PT5S",
    "p95Time": "PT20S",
    "isEstimateReliable": true,
    "estimateSource": "historical"
  },
  "recommendedEngine": "TRINO",
  "optimizations": [
    {
      "type": "PARTITION_FILTER",
      "description": "Add partition filter to reduce scan range",
      "suggestion": "Add WHERE month = ... to leverage date partitioning",
      "potentialSavingsUsd": 0.030,
      "speedupFactor": 3.5,
      "effort": "TRIVIAL"
    }
  ],
  "confidence": 0.85,
  "tableCosts": [
    {
      "tableName": "orders",
      "estimatedRows": 5000000,
      "estimatedBytes": 2147483648,
      "isPartitioned": true,
      "partitionPruning": "partial",
      "partitionsScanned": 12,
      "totalPartitions": 36
    }
  ]
}

Cost Categories

CategoryCostDurationDescription
TRIVIAL< $0.001< 1 secondMetadata queries, small lookups
LOW< $0.01< 10 secondsSimple aggregations, filtered scans
MEDIUM< $0.10< 1 minuteJoin queries, moderate scans
HIGH< $1.00< 10 minutesComplex analytics, large scans
VERY_HIGH>= $1.00>= 10 minutesFull table scans, cross-catalog joins
BLOCKED----Query would be rejected by cost policies

Compare Engine Costs

POST /v1/cost/compare-engines
curl -X POST http://query-engine:8080/v1/cost/compare-engines \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $JWT_TOKEN" \
  -d '{"query": "SELECT region, SUM(revenue) FROM sales GROUP BY region"}'
{
  "TRINO": {
    "engine": "TRINO",
    "estimatedCostUsd": 0.045,
    "estimatedTime": "PT8S",
    "estimatedBytesScanned": 2147483648,
    "isPreferred": true,
    "reason": "Best for complex analytics with joins",
    "suitabilityScore": 85.0
  },
  "CLICKHOUSE": {
    "engine": "CLICKHOUSE",
    "estimatedCostUsd": 0.015,
    "estimatedTime": "PT2S",
    "estimatedBytesScanned": 1073741824,
    "isPreferred": false,
    "reason": "Good for simple aggregations but limited join support",
    "suitabilityScore": 72.0
  }
}

Check Cost Thresholds

POST /v1/cost/check

Verify whether a query exceeds cost or duration limits before execution:

curl -X POST http://query-engine:8080/v1/cost/check \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $JWT_TOKEN" \
  -d '{
    "query": "SELECT * FROM large_table",
    "maxCostUsd": 0.50,
    "maxDurationSeconds": 300
  }'
{
  "allowed": false,
  "estimatedCost": 1.25,
  "maxCost": 0.50,
  "costExceeded": true,
  "estimatedDuration": "PT45S",
  "maxDuration": "PT5M",
  "durationExceeded": false,
  "suggestions": [
    "Add a WHERE clause to reduce scan range",
    "Use LIMIT to cap result set size"
  ]
}