Vector Stores

Vector stores power the Retrieval-Augmented Generation (RAG) pipeline in the MATIH Platform. They store vector embeddings of schema metadata, SQL query examples, business terminology, and documentation, enabling AI services to retrieve relevant context when generating SQL and answering questions.

All vector operations are centralized through the embeddings-service (Java Spring Boot, port 8213), which provides a REST API abstraction over Qdrant with per-user RBAC and tenant isolation. Python AI services connect via EmbeddingsClient (commons-python), not directly to Qdrant.

Architecture

Python AI Services                    Java Service                     Vector DB
─────────────────                    ─────────────                    ──────────
context-graph-service ──┐
ai-service ─────────────┤  httpx     ┌──────────────────┐  REST      ┌────────┐
ml-service ─────────────┼──────────→ │ embeddings-service│ ────────→ │ Qdrant │
copilot-service ────────┤  (via      │ (Java, port 8213) │            │ :6333  │
search-service ─────────┘  commons)  │                    │            └────────┘
                                     │ • @PreAuthorize    │  JDBC      ┌────────┐
                                     │ • tenant isolation │ ────────→ │ PgSQL  │
                                     │ • audit logging    │            │metadata│
                                     │ • Kafka events     │            └────────┘
                                     └──────────────────┘

Vector Store Options

Technology	Use Case	Deployment
Qdrant (via embeddings-service)	Production vector search	Kubernetes (Helm chart)
LanceDB	Development and testing	Embedded (no server)

Qdrant

Qdrant is the production vector database, accessed exclusively through the embeddings-service:

Aspect	Details
Index type	HNSW (Hierarchical Navigable Small World)
Distance metric	Cosine similarity (configurable per collection)
Filtering	Payload-based filtering with mandatory tenant ID
API	REST and gRPC (accessed via embeddings-service)
Multi-tenancy	Mandatory `tenant_id` filter injected by embeddings-service on every query
RBAC	`embeddings:read`, `embeddings:write`, `embeddings:delete`, `embeddings:admin`
Audit	All operations logged via AuditLogger + Kafka events

Embedding Sources

The RAG pipeline indexes the following content as vector embeddings:

Source	Indexed Content	Update Frequency
Catalog metadata	Table names, column names, descriptions, data types	On schema change
Query examples	Successful SQL queries with their natural language questions	After each successful query
Business terms	Ontology definitions, term relationships	On ontology update
Semantic model	Metric definitions, dimension descriptions	On model publish
Documentation	Platform and data documentation	On documentation update

RAG Query Flow

User Question: "What was revenue last quarter?"
  |
  v
Embedding Model: Convert question to vector
  |
  v
Qdrant: Search for similar vectors
  | Filter: tenant_id = "acme-corp"
  | Top-K: 5 most similar results
  |
  v
Retrieved Context:
  - Table: orders (columns: amount, order_date, customer_id)
  - Similar query: "SELECT SUM(amount) FROM orders WHERE ..."
  - Metric: revenue = SUM(orders.amount)
  |
  v
SQLAgent: Generate SQL using retrieved context

Collection Structure

Collection	Content	Embedding Dimension
`schema_metadata`	Table and column descriptions	1536
`query_examples`	Question-SQL pairs	1536
`business_terms`	Ontology definitions	1536
`semantic_models`	Metric definitions	1536

Each vector entry includes a payload with tenant ID, creation timestamp, and source metadata.

LanceDB (Development)

LanceDB provides an embedded vector store for development:

Aspect	Details
Deployment	Embedded in AI Service process
Storage	Local filesystem
Index	IVF-PQ for approximate search
Multi-tenancy	Separate tables per tenant

LanceDB requires no additional infrastructure, making it suitable for local development and testing.

Embeddings Service REST API

The embeddings-service exposes these endpoints (all require JWT authentication):

Endpoint	Method	Permission	Description
`/api/v1/collections`	POST	`embeddings:write`	Create collection
`/api/v1/collections`	GET	`embeddings:read`	List tenant collections
`/api/v1/collections/{name}`	DELETE	`embeddings:delete`	Delete collection
`/api/v1/vectors/upsert`	POST	`embeddings:write`	Batch upsert vectors
`/api/v1/vectors/search`	POST	`embeddings:read`	Similarity search
`/api/v1/vectors/fetch`	POST	`embeddings:read`	Fetch by IDs
`/api/v1/vectors`	DELETE	`embeddings:delete`	Delete vectors
`/api/v1/admin/stats`	GET	`embeddings:admin`	Cluster statistics

Python Client

from matih_commons.clients.embeddings_client import EmbeddingsClient
 
client = EmbeddingsClient()
await client.create_collection(tenant_id, "my-coll", vector_size=768)
await client.upsert(tenant_id, "my-coll", vectors=[...])
results = await client.search(tenant_id, "my-coll", vector=[0.1, ...], top_k=10)

Agent Flow -- RAG in the agent pipeline
Graph Stores -- Knowledge graph storage
ML Infrastructure -- AI technology stack

Trino Graph Stores