MATIH Platform is in active MVP development. Documentation reflects current implementation status.
2. Architecture
Data Stores
Vector Stores

Vector Stores

Vector stores power the Retrieval-Augmented Generation (RAG) pipeline in the MATIH Platform. They store vector embeddings of schema metadata, SQL query examples, business terminology, and documentation, enabling the AI Service to retrieve relevant context when generating SQL and answering questions.


Vector Store Options

TechnologyUse CaseDeployment
QdrantProduction vector searchKubernetes (Helm chart)
LanceDBDevelopment and testingEmbedded (no server)

Qdrant

Qdrant is the production vector database:

AspectDetails
Index typeHNSW (Hierarchical Navigable Small World)
Distance metricCosine similarity
FilteringPayload-based filtering with tenant ID
APIREST and gRPC
Multi-tenancyTenant ID in payload metadata, filtered at query time

Embedding Sources

The RAG pipeline indexes the following content as vector embeddings:

SourceIndexed ContentUpdate Frequency
Catalog metadataTable names, column names, descriptions, data typesOn schema change
Query examplesSuccessful SQL queries with their natural language questionsAfter each successful query
Business termsOntology definitions, term relationshipsOn ontology update
Semantic modelMetric definitions, dimension descriptionsOn model publish
DocumentationPlatform and data documentationOn documentation update

RAG Query Flow

User Question: "What was revenue last quarter?"
  |
  v
Embedding Model: Convert question to vector
  |
  v
Qdrant: Search for similar vectors
  | Filter: tenant_id = "acme-corp"
  | Top-K: 5 most similar results
  |
  v
Retrieved Context:
  - Table: orders (columns: amount, order_date, customer_id)
  - Similar query: "SELECT SUM(amount) FROM orders WHERE ..."
  - Metric: revenue = SUM(orders.amount)
  |
  v
SQLAgent: Generate SQL using retrieved context

Collection Structure

CollectionContentEmbedding Dimension
schema_metadataTable and column descriptions1536
query_examplesQuestion-SQL pairs1536
business_termsOntology definitions1536
semantic_modelsMetric definitions1536

Each vector entry includes a payload with tenant ID, creation timestamp, and source metadata.


LanceDB (Development)

LanceDB provides an embedded vector store for development:

AspectDetails
DeploymentEmbedded in AI Service process
StorageLocal filesystem
IndexIVF-PQ for approximate search
Multi-tenancySeparate tables per tenant

LanceDB requires no additional infrastructure, making it suitable for local development and testing.


Related Pages