Parquet Export
Apache Parquet is a columnar storage format optimized for analytical workloads. Parquet exports are ideal for loading results into data lakes, Spark jobs, or other analytical tools.
Endpoint
POST /v1/queries/{executionId}/exportcurl -X POST http://query-engine:8080/v1/queries/a1b2c3d4-e5f6-7890/export \
-H "Content-Type: application/json" \
-H "X-Tenant-ID: 550e8400-e29b-41d4-a716-446655440000" \
-H "Authorization: Bearer $JWT_TOKEN" \
-d '{
"format": "PARQUET",
"compress": true
}'Current Status
The Parquet export is currently in development. The service falls back to JSON Lines format when Parquet is requested:
private long writeParquetExport(UUID executionId, Path path, ExportRequest request)
throws IOException {
log.warn("Parquet export falling back to JSON format - Parquet library not configured");
return writeJsonExport(executionId, path, true);
}The planned production implementation will use the Apache Parquet library with Snappy compression:
// Planned implementation:
// Schema schema = buildParquetSchema(columns);
// try (ParquetWriter<GenericRecord> writer = AvroParquetWriter
// .<GenericRecord>builder(path)
// .withSchema(schema)
// .withCompressionCodec(CompressionCodecName.SNAPPY)
// .build()) {
// // Write records
// }Type Mapping
When Parquet support is fully implemented, SQL types will map to Parquet types:
| SQL Type | Parquet Type |
|---|---|
| VARCHAR | BINARY (UTF8) |
| INTEGER | INT32 |
| BIGINT | INT64 |
| DOUBLE | DOUBLE |
| DECIMAL | FIXED_LEN_BYTE_ARRAY |
| BOOLEAN | BOOLEAN |
| DATE | INT32 (DATE) |
| TIMESTAMP | INT96 or INT64 (TIMESTAMP_MILLIS) |