Conversational Analytics

Production - ai-service -- LangGraph multi-agent orchestrator with 5 specialized agents

Conversational analytics is the flagship capability of the MATIH Platform. It transforms natural language questions into executed queries, statistical analyses, and interactive visualizations through a coordinated multi-agent workflow. This capability is what makes MATIH fundamentally different from traditional BI and analytics tools: the primary interface is a conversation, not a query editor or a dashboard builder.

1.1How Conversational Analytics Works

When a user types a question into the Agentic Workbench, the following sequence executes in seconds:

User Question
  |
  v
Router Agent (Intent Classification)
  |
  +--> SQL Query Intent
  |      |
  |      v
  |    SQL Agent (Text-to-SQL Generation)
  |      |-- Schema Retrieval (Qdrant vector search)
  |      |-- SQL Generation (LLM with schema context)
  |      |-- SQL Validation (sqlglot + sqlparse)
  |      |-- Auto-Correction (up to 3 retries)
  |      |
  |      v
  |    Query Engine (Trino Execution)
  |      |-- Tenant-isolated credentials
  |      |-- Resource-quota enforcement
  |      |-- Result streaming
  |      |
  |      v
  |    Analysis Agent (Result Interpretation)
  |      |-- Statistical summary
  |      |-- Anomaly detection
  |      |-- Trend identification
  |      |
  |      v
  |    Visualization Agent (Chart Generation)
  |      |-- Chart type selection
  |      |-- Specification generation (ECharts/Vega-Lite)
  |      |-- Suggested follow-ups
  |      |
  |      v
  |    WebSocket Response (Streamed to client)
  |
  +--> Documentation Intent --> Documentation Agent
  |
  +--> Analysis Intent --> Analysis Agent (statistical)
  |
  +--> Visualization Intent --> Visualization Agent (chart modification)

The entire flow is implemented in the ai-service using LangGraph, a framework for building stateful, multi-agent applications as directed graphs. Each agent is a node in the graph, and edges define the transitions between agents based on the classified intent and intermediate results.

1.2The Multi-Agent Orchestrator

The AI Engine is built on a LangGraph state machine that coordinates five specialized agents:

Agent	Responsibility	Implementation	Output
Router Agent	Classifies user intent and determines the execution path	`data-plane/ai-service/src/agents/orchestrator.py`	Intent classification (SQL query, documentation, analysis, visualization)
SQL Agent	Generates SQL from natural language using schema context from Qdrant	`data-plane/ai-service/src/sql_generation/generator.py`	Validated SQL query string
Analysis Agent	Interprets query results and performs statistical analysis	`data-plane/ai-service/src/agents/orchestrator.py`	Natural language summary with statistical insights
Visualization Agent	Selects chart types and generates visualization specifications	`data-plane/ai-service/src/agents/orchestrator.py`	Chart specification (ECharts or Vega-Lite format)
Documentation Agent	Answers platform and data documentation questions using RAG	`data-plane/ai-service/src/agents/orchestrator.py`	Natural language response with references

Agent State Management

Each conversation maintains an AgentState that tracks:

Conversation history -- All previous messages and responses in the session
Schema context -- The tables and columns relevant to the current conversation
Generated SQL -- The SQL queries generated in the current session (for modification and follow-ups)
Query results -- Cached results from the most recent query execution
Visualization state -- The current chart configuration for iterative refinement
Tenant context -- The tenant ID, user roles, and access permissions

The state is persisted to Redis for session recovery, with a configurable TTL (default: 24 hours).

1.3Text-to-SQL Generation

The SQL generation pipeline is the most technically sophisticated component of the conversational analytics system. It goes far beyond simple prompt engineering to deliver accurate, secure, and performant SQL.

Pipeline Stages

Stage 1: Question Parsing

The QuestionParser extracts structured information from the natural language input:

Entities (table names, column names, metric names)
Time ranges ("last quarter", "year over year", "since January")
Aggregations ("total", "average", "maximum", "count")
Filters ("where region is West", "excluding returns")
Groupings ("by category", "broken down by region")
Sort orders ("highest", "top 10", "ranked by")

Stage 2: Schema Retrieval

The QdrantSchemaStore retrieves relevant table and column schemas using vector similarity search. For each tenant, schema descriptions are embedded using an embedding model (OpenAI text-embedding-ada-002 or a self-hosted alternative) and stored in tenant-specific Qdrant collections.

The retrieval process:

Embed the user's question using the same embedding model
Search the tenant's Qdrant collection for the top-k (default: 10) most similar schema descriptions
Return the matching table and column definitions with their metadata (types, descriptions, relationships)

Stage 3: SQL Generation

The LLM generates SQL using the parsed question and retrieved schema context. The prompt includes:

The parsed question structure
The relevant schema definitions (tables, columns, types, descriptions)
The SQL dialect (Trino by default)
Semantic layer metric definitions (if applicable)
Examples from the tenant's query history (few-shot learning)

Stage 4: Validation

The SQLValidator verifies the generated SQL:

Syntactic validation using sqlglot (parses the SQL into an AST)
Injection prevention using sqlparse (detects dangerous patterns: DROP, DELETE, INSERT, UPDATE)
Schema validation -- verifies that referenced tables and columns exist in the tenant's catalog
Dialect compliance -- ensures the SQL is valid for the target query engine (Trino)

Stage 5: Auto-Correction

If validation fails, the system feeds the error messages back to the LLM for iterative correction:

First retry: LLM receives the original question, schema context, generated SQL, and validation errors
Second retry: LLM receives all previous context plus the first correction attempt
Third retry: Same pattern. If the third attempt fails, the system returns the validation error to the user with a suggestion to rephrase the question

1.4Real-Time Streaming

All conversational interactions are streamed to the client via WebSocket connections. The streaming protocol supports:

Feature	Implementation	User Experience
Progressive SQL rendering	Tokens stream as the LLM generates SQL	User sees SQL appearing character by character
Result streaming	Query results stream as rows are returned from Trino	Tables populate progressively
Chart streaming	Chart specification is sent as soon as analysis completes	Visualization renders after data arrives
Cancellation	Client sends cancel message on WebSocket	Long-running queries are terminated mid-execution
Typing indicators	Agent state transitions emit status events	User sees "Generating SQL...", "Executing query...", "Analyzing results..."
Error streaming	Validation errors and suggestions stream immediately	User gets instant feedback on malformed requests

The WebSocket endpoint is implemented at /ws/chat/{session_id} in the ai-service FastAPI application. Authentication is handled via JWT tokens passed as query parameters during the WebSocket handshake.

1.5RAG (Retrieval-Augmented Generation)

The platform uses Qdrant as its vector store for three distinct RAG use cases:

Collection	Content	Purpose	Embedding Model
Schema embeddings	Table and column descriptions per tenant	Provide schema context for SQL generation	text-embedding-ada-002
Documentation	Platform and data documentation	Power the Documentation Agent for help queries	text-embedding-ada-002
Query history	Previously successful queries per tenant	Few-shot learning for SQL generation	text-embedding-ada-002

Tenant Isolation in RAG

Each tenant's embeddings are stored in separate Qdrant collections, ensuring that:

Tenant A's schema information is never used to generate SQL for Tenant B
Query history from one tenant does not influence another tenant's results
Documentation embeddings can be shared (platform docs) or tenant-specific (custom data docs)

1.6Conversational Context and Follow-Ups

One of the most powerful aspects of the conversational analytics system is its ability to maintain context across multiple turns. When a user asks a follow-up question, the system:

Retrieves the conversation history from Redis (or PostgreSQL for long-term persistence)
Identifies references to previous context ("break that down", "show me the same for", "exclude the outliers")
Modifies the previous SQL rather than generating from scratch, preserving the user's analytical thread
Maintains visualization state so chart type selections and formatting persist across turns

Example Multi-Turn Conversation

Turn 1: "What were our top 10 products by revenue last month?"
  -> Generates SQL with ORDER BY revenue DESC LIMIT 10
  -> Returns bar chart with 10 products

Turn 2: "Break that down by region"
  -> Modifies SQL to add GROUP BY region
  -> Returns stacked bar chart with products by region

Turn 3: "Show me the trend for the top 3 over the last 6 months"
  -> Modifies SQL to add time dimension, LIMIT 3
  -> Returns line chart with 3 products over 6 months

Turn 4: "Is there a statistical correlation between product rank and seasonality?"
  -> Triggers Analysis Agent for statistical computation
  -> Returns correlation analysis with narrative summary

Each turn builds on the previous context. The user never needs to re-specify "revenue", "products", or "last month" -- the system carries that context forward.

1.7Performance Characteristics

Metric	Target	Mechanism
Text-to-SQL latency (p95)	under 3 seconds	LLM streaming, schema caching, connection pooling
Simple query execution (p95)	under 500ms	Trino query optimization, result caching
Complex analytics (p95)	under 30 seconds	Trino distributed execution, resource groups
WebSocket latency	under 50ms	Direct connection, no HTTP overhead
Concurrent sessions per tenant	1,000+	Horizontal scaling of ai-service pods
Context window	Up to 20 turns	Redis-backed session state with configurable history depth

Deep Dive References

For implementation details, see:

AI Service Architecture -- Complete service documentation including agent implementation, configuration, and deployment
Text-to-SQL Deep Dive -- Detailed SQL generation pipeline with code walkthroughs
Agent Orchestration -- LangGraph state machine design and agent coordination patterns
Query Engine -- Trino configuration, federated query execution, and caching

Overview BI & Dashboards