MATIH Platform is in active MVP development. Documentation reflects current implementation status.
1. Introduction
Conversational Analytics

Conversational Analytics

Production - ai-service -- LangGraph multi-agent orchestrator with 5 specialized agents

Conversational analytics is the flagship capability of the MATIH Platform. It transforms natural language questions into executed queries, statistical analyses, and interactive visualizations through a coordinated multi-agent workflow. This capability is what makes MATIH fundamentally different from traditional BI and analytics tools: the primary interface is a conversation, not a query editor or a dashboard builder.


1.1How Conversational Analytics Works

When a user types a question into the Agentic Workbench, the following sequence executes in seconds:

User Question
  |
  v
Router Agent (Intent Classification)
  |
  +--> SQL Query Intent
  |      |
  |      v
  |    SQL Agent (Text-to-SQL Generation)
  |      |-- Schema Retrieval (Qdrant vector search)
  |      |-- SQL Generation (LLM with schema context)
  |      |-- SQL Validation (sqlglot + sqlparse)
  |      |-- Auto-Correction (up to 3 retries)
  |      |
  |      v
  |    Query Engine (Trino Execution)
  |      |-- Tenant-isolated credentials
  |      |-- Resource-quota enforcement
  |      |-- Result streaming
  |      |
  |      v
  |    Analysis Agent (Result Interpretation)
  |      |-- Statistical summary
  |      |-- Anomaly detection
  |      |-- Trend identification
  |      |
  |      v
  |    Visualization Agent (Chart Generation)
  |      |-- Chart type selection
  |      |-- Specification generation (ECharts/Vega-Lite)
  |      |-- Suggested follow-ups
  |      |
  |      v
  |    WebSocket Response (Streamed to client)
  |
  +--> Documentation Intent --> Documentation Agent
  |
  +--> Analysis Intent --> Analysis Agent (statistical)
  |
  +--> Visualization Intent --> Visualization Agent (chart modification)

The entire flow is implemented in the ai-service using LangGraph, a framework for building stateful, multi-agent applications as directed graphs. Each agent is a node in the graph, and edges define the transitions between agents based on the classified intent and intermediate results.


1.2The Multi-Agent Orchestrator

The AI Engine is built on a LangGraph state machine that coordinates five specialized agents:

AgentResponsibilityImplementationOutput
Router AgentClassifies user intent and determines the execution pathdata-plane/ai-service/src/agents/orchestrator.pyIntent classification (SQL query, documentation, analysis, visualization)
SQL AgentGenerates SQL from natural language using schema context from Qdrantdata-plane/ai-service/src/sql_generation/generator.pyValidated SQL query string
Analysis AgentInterprets query results and performs statistical analysisdata-plane/ai-service/src/agents/orchestrator.pyNatural language summary with statistical insights
Visualization AgentSelects chart types and generates visualization specificationsdata-plane/ai-service/src/agents/orchestrator.pyChart specification (ECharts or Vega-Lite format)
Documentation AgentAnswers platform and data documentation questions using RAGdata-plane/ai-service/src/agents/orchestrator.pyNatural language response with references

Agent State Management

Each conversation maintains an AgentState that tracks:

  • Conversation history -- All previous messages and responses in the session
  • Schema context -- The tables and columns relevant to the current conversation
  • Generated SQL -- The SQL queries generated in the current session (for modification and follow-ups)
  • Query results -- Cached results from the most recent query execution
  • Visualization state -- The current chart configuration for iterative refinement
  • Tenant context -- The tenant ID, user roles, and access permissions

The state is persisted to Redis for session recovery, with a configurable TTL (default: 24 hours).


1.3Text-to-SQL Generation

The SQL generation pipeline is the most technically sophisticated component of the conversational analytics system. It goes far beyond simple prompt engineering to deliver accurate, secure, and performant SQL.

Pipeline Stages

Stage 1: Question Parsing

The QuestionParser extracts structured information from the natural language input:

  • Entities (table names, column names, metric names)
  • Time ranges ("last quarter", "year over year", "since January")
  • Aggregations ("total", "average", "maximum", "count")
  • Filters ("where region is West", "excluding returns")
  • Groupings ("by category", "broken down by region")
  • Sort orders ("highest", "top 10", "ranked by")

Stage 2: Schema Retrieval

The QdrantSchemaStore retrieves relevant table and column schemas using vector similarity search. For each tenant, schema descriptions are embedded using an embedding model (OpenAI text-embedding-ada-002 or a self-hosted alternative) and stored in tenant-specific Qdrant collections.

The retrieval process:

  1. Embed the user's question using the same embedding model
  2. Search the tenant's Qdrant collection for the top-k (default: 10) most similar schema descriptions
  3. Return the matching table and column definitions with their metadata (types, descriptions, relationships)

Stage 3: SQL Generation

The LLM generates SQL using the parsed question and retrieved schema context. The prompt includes:

  • The parsed question structure
  • The relevant schema definitions (tables, columns, types, descriptions)
  • The SQL dialect (Trino by default)
  • Semantic layer metric definitions (if applicable)
  • Examples from the tenant's query history (few-shot learning)

Stage 4: Validation

The SQLValidator verifies the generated SQL:

  • Syntactic validation using sqlglot (parses the SQL into an AST)
  • Injection prevention using sqlparse (detects dangerous patterns: DROP, DELETE, INSERT, UPDATE)
  • Schema validation -- verifies that referenced tables and columns exist in the tenant's catalog
  • Dialect compliance -- ensures the SQL is valid for the target query engine (Trino)

Stage 5: Auto-Correction

If validation fails, the system feeds the error messages back to the LLM for iterative correction:

  1. First retry: LLM receives the original question, schema context, generated SQL, and validation errors
  2. Second retry: LLM receives all previous context plus the first correction attempt
  3. Third retry: Same pattern. If the third attempt fails, the system returns the validation error to the user with a suggestion to rephrase the question

1.4Real-Time Streaming

All conversational interactions are streamed to the client via WebSocket connections. The streaming protocol supports:

FeatureImplementationUser Experience
Progressive SQL renderingTokens stream as the LLM generates SQLUser sees SQL appearing character by character
Result streamingQuery results stream as rows are returned from TrinoTables populate progressively
Chart streamingChart specification is sent as soon as analysis completesVisualization renders after data arrives
CancellationClient sends cancel message on WebSocketLong-running queries are terminated mid-execution
Typing indicatorsAgent state transitions emit status eventsUser sees "Generating SQL...", "Executing query...", "Analyzing results..."
Error streamingValidation errors and suggestions stream immediatelyUser gets instant feedback on malformed requests

The WebSocket endpoint is implemented at /ws/chat/{session_id} in the ai-service FastAPI application. Authentication is handled via JWT tokens passed as query parameters during the WebSocket handshake.


1.5RAG (Retrieval-Augmented Generation)

The platform uses Qdrant as its vector store for three distinct RAG use cases:

CollectionContentPurposeEmbedding Model
Schema embeddingsTable and column descriptions per tenantProvide schema context for SQL generationtext-embedding-ada-002
DocumentationPlatform and data documentationPower the Documentation Agent for help queriestext-embedding-ada-002
Query historyPreviously successful queries per tenantFew-shot learning for SQL generationtext-embedding-ada-002

Tenant Isolation in RAG

Each tenant's embeddings are stored in separate Qdrant collections, ensuring that:

  • Tenant A's schema information is never used to generate SQL for Tenant B
  • Query history from one tenant does not influence another tenant's results
  • Documentation embeddings can be shared (platform docs) or tenant-specific (custom data docs)

1.6Conversational Context and Follow-Ups

One of the most powerful aspects of the conversational analytics system is its ability to maintain context across multiple turns. When a user asks a follow-up question, the system:

  1. Retrieves the conversation history from Redis (or PostgreSQL for long-term persistence)
  2. Identifies references to previous context ("break that down", "show me the same for", "exclude the outliers")
  3. Modifies the previous SQL rather than generating from scratch, preserving the user's analytical thread
  4. Maintains visualization state so chart type selections and formatting persist across turns

Example Multi-Turn Conversation

Turn 1: "What were our top 10 products by revenue last month?"
  -> Generates SQL with ORDER BY revenue DESC LIMIT 10
  -> Returns bar chart with 10 products

Turn 2: "Break that down by region"
  -> Modifies SQL to add GROUP BY region
  -> Returns stacked bar chart with products by region

Turn 3: "Show me the trend for the top 3 over the last 6 months"
  -> Modifies SQL to add time dimension, LIMIT 3
  -> Returns line chart with 3 products over 6 months

Turn 4: "Is there a statistical correlation between product rank and seasonality?"
  -> Triggers Analysis Agent for statistical computation
  -> Returns correlation analysis with narrative summary

Each turn builds on the previous context. The user never needs to re-specify "revenue", "products", or "last month" -- the system carries that context forward.


1.7Performance Characteristics

MetricTargetMechanism
Text-to-SQL latency (p95)under 3 secondsLLM streaming, schema caching, connection pooling
Simple query execution (p95)under 500msTrino query optimization, result caching
Complex analytics (p95)under 30 secondsTrino distributed execution, resource groups
WebSocket latencyunder 50msDirect connection, no HTTP overhead
Concurrent sessions per tenant1,000+Horizontal scaling of ai-service pods
Context windowUp to 20 turnsRedis-backed session state with configurable history depth

Deep Dive References

For implementation details, see: