Conversational Analytics
Conversational analytics is the flagship capability of the MATIH Platform. It transforms natural language questions into executed queries, statistical analyses, and interactive visualizations through a coordinated multi-agent workflow. This capability is what makes MATIH fundamentally different from traditional BI and analytics tools: the primary interface is a conversation, not a query editor or a dashboard builder.
1.1How Conversational Analytics Works
When a user types a question into the Agentic Workbench, the following sequence executes in seconds:
User Question
|
v
Router Agent (Intent Classification)
|
+--> SQL Query Intent
| |
| v
| SQL Agent (Text-to-SQL Generation)
| |-- Schema Retrieval (Qdrant vector search)
| |-- SQL Generation (LLM with schema context)
| |-- SQL Validation (sqlglot + sqlparse)
| |-- Auto-Correction (up to 3 retries)
| |
| v
| Query Engine (Trino Execution)
| |-- Tenant-isolated credentials
| |-- Resource-quota enforcement
| |-- Result streaming
| |
| v
| Analysis Agent (Result Interpretation)
| |-- Statistical summary
| |-- Anomaly detection
| |-- Trend identification
| |
| v
| Visualization Agent (Chart Generation)
| |-- Chart type selection
| |-- Specification generation (ECharts/Vega-Lite)
| |-- Suggested follow-ups
| |
| v
| WebSocket Response (Streamed to client)
|
+--> Documentation Intent --> Documentation Agent
|
+--> Analysis Intent --> Analysis Agent (statistical)
|
+--> Visualization Intent --> Visualization Agent (chart modification)The entire flow is implemented in the ai-service using LangGraph, a framework for building stateful, multi-agent applications as directed graphs. Each agent is a node in the graph, and edges define the transitions between agents based on the classified intent and intermediate results.
1.2The Multi-Agent Orchestrator
The AI Engine is built on a LangGraph state machine that coordinates five specialized agents:
| Agent | Responsibility | Implementation | Output |
|---|---|---|---|
| Router Agent | Classifies user intent and determines the execution path | data-plane/ai-service/src/agents/orchestrator.py | Intent classification (SQL query, documentation, analysis, visualization) |
| SQL Agent | Generates SQL from natural language using schema context from Qdrant | data-plane/ai-service/src/sql_generation/generator.py | Validated SQL query string |
| Analysis Agent | Interprets query results and performs statistical analysis | data-plane/ai-service/src/agents/orchestrator.py | Natural language summary with statistical insights |
| Visualization Agent | Selects chart types and generates visualization specifications | data-plane/ai-service/src/agents/orchestrator.py | Chart specification (ECharts or Vega-Lite format) |
| Documentation Agent | Answers platform and data documentation questions using RAG | data-plane/ai-service/src/agents/orchestrator.py | Natural language response with references |
Agent State Management
Each conversation maintains an AgentState that tracks:
- Conversation history -- All previous messages and responses in the session
- Schema context -- The tables and columns relevant to the current conversation
- Generated SQL -- The SQL queries generated in the current session (for modification and follow-ups)
- Query results -- Cached results from the most recent query execution
- Visualization state -- The current chart configuration for iterative refinement
- Tenant context -- The tenant ID, user roles, and access permissions
The state is persisted to Redis for session recovery, with a configurable TTL (default: 24 hours).
1.3Text-to-SQL Generation
The SQL generation pipeline is the most technically sophisticated component of the conversational analytics system. It goes far beyond simple prompt engineering to deliver accurate, secure, and performant SQL.
Pipeline Stages
Stage 1: Question Parsing
The QuestionParser extracts structured information from the natural language input:
- Entities (table names, column names, metric names)
- Time ranges ("last quarter", "year over year", "since January")
- Aggregations ("total", "average", "maximum", "count")
- Filters ("where region is West", "excluding returns")
- Groupings ("by category", "broken down by region")
- Sort orders ("highest", "top 10", "ranked by")
Stage 2: Schema Retrieval
The QdrantSchemaStore retrieves relevant table and column schemas using vector similarity search. For each tenant, schema descriptions are embedded using an embedding model (OpenAI text-embedding-ada-002 or a self-hosted alternative) and stored in tenant-specific Qdrant collections.
The retrieval process:
- Embed the user's question using the same embedding model
- Search the tenant's Qdrant collection for the top-k (default: 10) most similar schema descriptions
- Return the matching table and column definitions with their metadata (types, descriptions, relationships)
Stage 3: SQL Generation
The LLM generates SQL using the parsed question and retrieved schema context. The prompt includes:
- The parsed question structure
- The relevant schema definitions (tables, columns, types, descriptions)
- The SQL dialect (Trino by default)
- Semantic layer metric definitions (if applicable)
- Examples from the tenant's query history (few-shot learning)
Stage 4: Validation
The SQLValidator verifies the generated SQL:
- Syntactic validation using
sqlglot(parses the SQL into an AST) - Injection prevention using
sqlparse(detects dangerous patterns: DROP, DELETE, INSERT, UPDATE) - Schema validation -- verifies that referenced tables and columns exist in the tenant's catalog
- Dialect compliance -- ensures the SQL is valid for the target query engine (Trino)
Stage 5: Auto-Correction
If validation fails, the system feeds the error messages back to the LLM for iterative correction:
- First retry: LLM receives the original question, schema context, generated SQL, and validation errors
- Second retry: LLM receives all previous context plus the first correction attempt
- Third retry: Same pattern. If the third attempt fails, the system returns the validation error to the user with a suggestion to rephrase the question
1.4Real-Time Streaming
All conversational interactions are streamed to the client via WebSocket connections. The streaming protocol supports:
| Feature | Implementation | User Experience |
|---|---|---|
| Progressive SQL rendering | Tokens stream as the LLM generates SQL | User sees SQL appearing character by character |
| Result streaming | Query results stream as rows are returned from Trino | Tables populate progressively |
| Chart streaming | Chart specification is sent as soon as analysis completes | Visualization renders after data arrives |
| Cancellation | Client sends cancel message on WebSocket | Long-running queries are terminated mid-execution |
| Typing indicators | Agent state transitions emit status events | User sees "Generating SQL...", "Executing query...", "Analyzing results..." |
| Error streaming | Validation errors and suggestions stream immediately | User gets instant feedback on malformed requests |
The WebSocket endpoint is implemented at /ws/chat/{session_id} in the ai-service FastAPI application. Authentication is handled via JWT tokens passed as query parameters during the WebSocket handshake.
1.5RAG (Retrieval-Augmented Generation)
The platform uses Qdrant as its vector store for three distinct RAG use cases:
| Collection | Content | Purpose | Embedding Model |
|---|---|---|---|
| Schema embeddings | Table and column descriptions per tenant | Provide schema context for SQL generation | text-embedding-ada-002 |
| Documentation | Platform and data documentation | Power the Documentation Agent for help queries | text-embedding-ada-002 |
| Query history | Previously successful queries per tenant | Few-shot learning for SQL generation | text-embedding-ada-002 |
Tenant Isolation in RAG
Each tenant's embeddings are stored in separate Qdrant collections, ensuring that:
- Tenant A's schema information is never used to generate SQL for Tenant B
- Query history from one tenant does not influence another tenant's results
- Documentation embeddings can be shared (platform docs) or tenant-specific (custom data docs)
1.6Conversational Context and Follow-Ups
One of the most powerful aspects of the conversational analytics system is its ability to maintain context across multiple turns. When a user asks a follow-up question, the system:
- Retrieves the conversation history from Redis (or PostgreSQL for long-term persistence)
- Identifies references to previous context ("break that down", "show me the same for", "exclude the outliers")
- Modifies the previous SQL rather than generating from scratch, preserving the user's analytical thread
- Maintains visualization state so chart type selections and formatting persist across turns
Example Multi-Turn Conversation
Turn 1: "What were our top 10 products by revenue last month?"
-> Generates SQL with ORDER BY revenue DESC LIMIT 10
-> Returns bar chart with 10 products
Turn 2: "Break that down by region"
-> Modifies SQL to add GROUP BY region
-> Returns stacked bar chart with products by region
Turn 3: "Show me the trend for the top 3 over the last 6 months"
-> Modifies SQL to add time dimension, LIMIT 3
-> Returns line chart with 3 products over 6 months
Turn 4: "Is there a statistical correlation between product rank and seasonality?"
-> Triggers Analysis Agent for statistical computation
-> Returns correlation analysis with narrative summaryEach turn builds on the previous context. The user never needs to re-specify "revenue", "products", or "last month" -- the system carries that context forward.
1.7Performance Characteristics
| Metric | Target | Mechanism |
|---|---|---|
| Text-to-SQL latency (p95) | under 3 seconds | LLM streaming, schema caching, connection pooling |
| Simple query execution (p95) | under 500ms | Trino query optimization, result caching |
| Complex analytics (p95) | under 30 seconds | Trino distributed execution, resource groups |
| WebSocket latency | under 50ms | Direct connection, no HTTP overhead |
| Concurrent sessions per tenant | 1,000+ | Horizontal scaling of ai-service pods |
| Context window | Up to 20 turns | Redis-backed session state with configurable history depth |
Deep Dive References
For implementation details, see:
- AI Service Architecture -- Complete service documentation including agent implementation, configuration, and deployment
- Text-to-SQL Deep Dive -- Detailed SQL generation pipeline with code walkthroughs
- Agent Orchestration -- LangGraph state machine design and agent coordination patterns
- Query Engine -- Trino configuration, federated query execution, and caching