Parquet Export

Apache Parquet is a columnar storage format optimized for analytical workloads. Parquet exports are ideal for loading results into data lakes, Spark jobs, or other analytical tools.

Endpoint

POST /v1/queries/{executionId}/export

curl -X POST http://query-engine:8080/v1/queries/a1b2c3d4-e5f6-7890/export \
  -H "Content-Type: application/json" \
  -H "X-Tenant-ID: 550e8400-e29b-41d4-a716-446655440000" \
  -H "Authorization: Bearer $JWT_TOKEN" \
  -d '{
    "format": "PARQUET",
    "compress": true
  }'

Current Status

The Parquet export is currently in development. The service falls back to JSON Lines format when Parquet is requested:

private long writeParquetExport(UUID executionId, Path path, ExportRequest request)
        throws IOException {
    log.warn("Parquet export falling back to JSON format - Parquet library not configured");
    return writeJsonExport(executionId, path, true);
}

The planned production implementation will use the Apache Parquet library with Snappy compression:

// Planned implementation:
// Schema schema = buildParquetSchema(columns);
// try (ParquetWriter<GenericRecord> writer = AvroParquetWriter
//         .<GenericRecord>builder(path)
//         .withSchema(schema)
//         .withCompressionCodec(CompressionCodecName.SNAPPY)
//         .build()) {
//     // Write records
// }

Type Mapping

When Parquet support is fully implemented, SQL types will map to Parquet types:

SQL Type	Parquet Type
VARCHAR	BINARY (UTF8)
INTEGER	INT32
BIGINT	INT64
DOUBLE	DOUBLE
DECIMAL	FIXED_LEN_BYTE_ARRAY
BOOLEAN	BOOLEAN
DATE	INT32 (DATE)
TIMESTAMP	INT96 or INT64 (TIMESTAMP_MILLIS)

CSV Export JSON Export