SQL Generation
Production - LLM-powered NL to SQL conversion with multi-dialect support
The SQL Generation component converts parsed natural language questions into SQL queries using LLM prompting with schema context. It supports multiple SQL dialects including Trino, Spark, PostgreSQL, BigQuery, Snowflake, and DuckDB.
12.3.2.1Generation Process
The LLM receives a carefully constructed prompt containing:
- System instructions: SQL generation rules, dialect-specific syntax
- Schema context: Relevant tables, columns, relationships
- Semantic mappings: Business term to column mappings
- Sample queries: Similar previously-successful queries
- User question: The natural language question with parsed intent and entities
async def _generate_sql(self, question, parsed_question, context, dialect):
return await self.llm_client.generate_sql(
question=question,
tables=context.tables,
columns=context.columns,
relationships=context.relationships,
semantic_mappings=context.semantic_mappings,
sample_queries=context.sample_queries,
dialect=dialect,
intent=parsed_question.intent.value,
entities=parsed_question.entities,
time_references=parsed_question.time_references,
aggregations=parsed_question.aggregations,
)12.3.2.2Supported Dialects
| Dialect | Key | Notes |
|---|---|---|
| Trino | trino | Default dialect, federated query engine |
| Spark SQL | spark | Distributed processing engine |
| PostgreSQL | postgresql | Relational database |
| BigQuery | bigquery | Google Cloud data warehouse |
| Snowflake | snowflake | Cloud data warehouse |
| DuckDB | duckdb | Embedded analytical database |
| MySQL | mysql | Relational database |
Dialect Transpilation
The SQLValidator includes a transpilation feature using sqlglot:
def transpile_sql(self, sql: str, source_dialect: str, target_dialect: str) -> str:
source = self.DIALECT_MAP.get(source_dialect, source_dialect)
target = self.DIALECT_MAP.get(target_dialect, target_dialect)
return sqlglot.transpile(sql, read=source, write=target)[0]12.3.2.3Auto-Correction Loop
When validation fails, the generator attempts auto-correction:
for attempt in range(self.settings.max_sql_retries + 1):
validation_result = self.validator.validate(sql, dialect, available_tables, available_columns)
if validation_result.is_valid:
break
if not auto_correct or attempt >= self.settings.max_sql_retries:
break
# Send errors back to LLM for correction
error_msg = "; ".join(e.get("message", "") for e in validation_result.errors)
sql = await self.llm_client.correct_sql(
sql=sql,
error=error_msg,
tables=context.tables,
columns=context.columns,
)
retries += 112.3.2.4API Usage
curl -X POST http://localhost:8000/api/v1/text-to-sql \
-H "Content-Type: application/json" \
-H "X-Tenant-ID: acme-corp" \
-d '{
"question": "What is the average order value by region for Q4 2024?",
"dialect": "trino",
"validate": true,
"auto_correct": true,
"catalog": "lakehouse"
}'Response:
{
"success": true,
"query": {
"sql": "SELECT r.region_name, AVG(o.revenue) AS avg_order_value\nFROM lakehouse.sales.orders o\nJOIN lakehouse.sales.regions r ON o.region_id = r.region_id\nWHERE o.order_date >= DATE '2024-10-01' AND o.order_date < DATE '2025-01-01'\nGROUP BY r.region_name\nORDER BY avg_order_value DESC",
"confidence": 0.87,
"tables_used": ["lakehouse.sales.orders", "lakehouse.sales.regions"],
"explanation": "This query calculates aggregate values from orders, regions using AVG for Q4 2024.",
"status": "validated"
},
"validation": {
"is_valid": true,
"errors": [],
"warnings": []
},
"retries": 0
}