MATIH Platform is in active MVP development. Documentation reflects current implementation status.
9. Query Engine & SQL
Query Execution
Asynchronous Execution

Asynchronous Query Execution

Asynchronous execution is designed for long-running queries, large-scale scans, and batch processing workloads. The client submits a query and immediately receives an execution ID, which it can use to poll for status and retrieve results when ready.


Endpoint

POST /v1/queries/execute/async

Required Headers

HeaderTypeDescription
X-Tenant-IDUUIDThe tenant identifier
X-User-IDUUIDThe user performing the query
AuthorizationStringBearer JWT token

Submission Flow

The QueryExecutionService.submitAsync() method handles async query submission:

@Transactional
public AsyncQueryResponse submitAsync(UUID tenantId, UUID userId, QueryRequest request) {
    validateConcurrentQueryLimit(tenantId);
 
    String queryHash = calculateQueryHash(request.getSql());
    EngineType engineType = queryRouter.route(request);
 
    QueryExecution execution = createExecution(tenantId, userId, request, queryHash, engineType);
    execution.setStatus(QueryStatus.QUEUED);
    execution = executionRepository.save(execution);
 
    // Trigger async execution
    executeAsync(tenantId, userId, request);
 
    return AsyncQueryResponse.builder()
            .executionId(execution.getId())
            .status(QueryStatus.QUEUED)
            .engineType(engineType)
            .statusUrl("/api/v1/queries/" + execution.getId() + "/status")
            .resultUrl("/api/v1/queries/" + execution.getId() + "/results")
            .cancelUrl("/api/v1/queries/" + execution.getId() + "/cancel")
            .submittedAt(execution.getSubmittedAt())
            .message("Query submitted successfully")
            .build();
}

The method returns immediately with an AsyncQueryResponse containing URLs for status polling, result retrieval, and cancellation.


AsyncQueryResponse Structure

{
  "executionId": "d290f1ee-6c54-4b01-90e6-d701748f0851",
  "status": "QUEUED",
  "engineType": "SPARK_ASYNC",
  "statusUrl": "/api/v1/queries/d290f1ee-6c54-4b01-90e6-d701748f0851/status",
  "resultUrl": "/api/v1/queries/d290f1ee-6c54-4b01-90e6-d701748f0851/results",
  "cancelUrl": "/api/v1/queries/d290f1ee-6c54-4b01-90e6-d701748f0851/cancel",
  "submittedAt": "2026-02-12T10:30:00Z",
  "estimatedCompletionAt": "2026-02-12T10:35:00Z",
  "queuePosition": 3,
  "message": "Query submitted successfully"
}

Status Polling

After submission, clients poll the status endpoint to track execution progress:

GET /v1/queries/{executionId}/status

curl Example

# Submit async query
RESPONSE=$(curl -s -X POST http://query-engine:8080/v1/queries/execute/async \
  -H "Content-Type: application/json" \
  -H "X-Tenant-ID: 550e8400-e29b-41d4-a716-446655440000" \
  -H "X-User-ID: 6ba7b810-9dad-11d1-80b4-00c04fd430c8" \
  -H "Authorization: Bearer $JWT_TOKEN" \
  -d '{
    "sql": "SELECT region, product_category, SUM(revenue) as total_revenue FROM sales_facts WHERE year = 2025 GROUP BY region, product_category",
    "catalog": "iceberg",
    "schema": "analytics"
  }')
 
EXECUTION_ID=$(echo $RESPONSE | jq -r '.executionId')
 
# Poll for status
curl -s http://query-engine:8080/v1/queries/$EXECUTION_ID/status \
  -H "X-Tenant-ID: 550e8400-e29b-41d4-a716-446655440000" \
  -H "Authorization: Bearer $JWT_TOKEN"

Status Response (In Progress)

{
  "executionId": "d290f1ee-6c54-4b01-90e6-d701748f0851",
  "status": "RUNNING",
  "engineType": "TRINO",
  "executionTimeMs": 12500,
  "cacheHit": false,
  "submittedAt": "2026-02-12T10:30:00Z"
}

Status Response (Complete)

{
  "executionId": "d290f1ee-6c54-4b01-90e6-d701748f0851",
  "status": "COMPLETED",
  "engineType": "TRINO",
  "rowCount": 450,
  "bytesScanned": 2147483648,
  "executionTimeMs": 34200,
  "cacheHit": false,
  "submittedAt": "2026-02-12T10:30:00Z",
  "completedAt": "2026-02-12T10:30:34.200Z"
}

Result Retrieval

Once the status shows COMPLETED, retrieve the results with pagination:

GET /v1/queries/{executionId}/results?page=0&size=1000

curl Example

# Retrieve results with pagination
curl -s "http://query-engine:8080/v1/queries/$EXECUTION_ID/results?page=0&size=100" \
  -H "X-Tenant-ID: 550e8400-e29b-41d4-a716-446655440000" \
  -H "Authorization: Bearer $JWT_TOKEN"

Pagination Parameters

ParameterTypeDefaultDescription
pageint0Zero-based page number
sizeint1000Number of rows per page

Async Execution Internals

The async execution runs on a dedicated thread pool configured via AsyncConfig:

@Async("queryExecutor")
@Transactional
public CompletableFuture<QueryResponse> executeAsync(UUID tenantId, UUID userId, QueryRequest request) {
    return CompletableFuture.supplyAsync(() -> {
        try {
            return executeSync(tenantId, userId, request);
        } catch (Exception e) {
            log.error("Async query execution failed", e);
            throw new QueryExecutionException("Async query failed: " + e.getMessage(), e);
        }
    });
}

The async method delegates to the same executeSync pipeline, meaning all cache checks, RLS injection, and engine routing apply identically. The only difference is that the execution happens on a background thread pool while the client receives an immediate response.


Recommended Polling Strategy

For clients polling async query status, use exponential backoff:

Initial delay: 500ms
Max delay: 10s
Backoff multiplier: 1.5
Max attempts: 120 (covers ~10 minutes)

Example Polling Loop (JavaScript)

async function pollQueryStatus(executionId, tenantId) {
  let delay = 500;
  const maxDelay = 10000;
 
  while (true) {
    const response = await fetch(
      `/v1/queries/${executionId}/status`,
      { headers: { 'X-Tenant-ID': tenantId } }
    );
    const status = await response.json();
 
    if (status.status === 'COMPLETED') return status;
    if (status.status === 'FAILED') throw new Error(status.errorMessage);
    if (status.status === 'CANCELLED') throw new Error('Query cancelled');
 
    await new Promise(resolve => setTimeout(resolve, delay));
    delay = Math.min(delay * 1.5, maxDelay);
  }
}

When to Use Async vs Sync

CriteriaSynchronousAsynchronous
Expected execution timeUnder 30 secondsOver 30 seconds
Data scan sizeUnder 10GBOver 10GB
Result set sizeUnder 10,000 rowsAny size
User interactionInteractive dashboards, exploratoryBatch reports, ETL queries
Engine routingTrino, ClickHouse, DuckDBSpark Async, Trino (large)