NLP Processing
Production - Entity extraction, intent detection, ambiguity resolution, normalization
The NLP Processing layer parses natural language questions to extract structured metadata that guides SQL generation. It identifies the query intent, extracts entities (tables, columns, values), detects time references, and resolves ambiguities.
12.3.5.1QuestionParser
class QuestionParser:
def parse(self, question: str) -> ParsedQuestion:
"""Parse a natural language question into structured form."""
return ParsedQuestion(
intent=self._detect_intent(question),
entities=self._extract_entities(question),
confidence=self._calculate_confidence(question),
time_references=self._extract_time_refs(question),
aggregations=self._extract_aggregations(question),
filters=self._extract_filters(question),
)Intent Types
| Intent | Description | Example |
|---|---|---|
select | Simple data retrieval | "Show me all customers" |
aggregate | Aggregation query | "What is total revenue?" |
compare | Comparison query | "Compare Q3 vs Q4 sales" |
trend | Time-series analysis | "Show revenue trend for 2024" |
filter | Filtered query | "Customers in the EMEA region" |
join | Multi-table query | "Orders with customer details" |
rank | Ranking query | "Top 10 products by sales" |
Entity Extraction
The parser identifies several entity types:
- Table references: "orders", "customers", "sales"
- Column references: "revenue", "order date", "customer name"
- Value literals: "EMEA", "2024", "$1M"
- Time references: "last quarter", "this year", "past 30 days"
- Aggregations: "total", "average", "count", "maximum"