Vector Stores
Vector stores power the Retrieval-Augmented Generation (RAG) pipeline in the MATIH Platform. They store vector embeddings of schema metadata, SQL query examples, business terminology, and documentation, enabling AI services to retrieve relevant context when generating SQL and answering questions.
All vector operations are centralized through the embeddings-service (Java Spring Boot, port 8213), which provides a REST API abstraction over Qdrant with per-user RBAC and tenant isolation. Python AI services connect via EmbeddingsClient (commons-python), not directly to Qdrant.
Architecture
Python AI Services Java Service Vector DB
───────────────── ───────────── ──────────
context-graph-service ──┐
ai-service ─────────────┤ httpx ┌──────────────────┐ REST ┌────────┐
ml-service ─────────────┼──────────→ │ embeddings-service│ ────────→ │ Qdrant │
copilot-service ────────┤ (via │ (Java, port 8213) │ │ :6333 │
search-service ─────────┘ commons) │ │ └────────┘
│ • @PreAuthorize │ JDBC ┌────────┐
│ • tenant isolation │ ────────→ │ PgSQL │
│ • audit logging │ │metadata│
│ • Kafka events │ └────────┘
└──────────────────┘Vector Store Options
| Technology | Use Case | Deployment |
|---|---|---|
| Qdrant (via embeddings-service) | Production vector search | Kubernetes (Helm chart) |
| LanceDB | Development and testing | Embedded (no server) |
Qdrant
Qdrant is the production vector database, accessed exclusively through the embeddings-service:
| Aspect | Details |
|---|---|
| Index type | HNSW (Hierarchical Navigable Small World) |
| Distance metric | Cosine similarity (configurable per collection) |
| Filtering | Payload-based filtering with mandatory tenant ID |
| API | REST and gRPC (accessed via embeddings-service) |
| Multi-tenancy | Mandatory tenant_id filter injected by embeddings-service on every query |
| RBAC | embeddings:read, embeddings:write, embeddings:delete, embeddings:admin |
| Audit | All operations logged via AuditLogger + Kafka events |
Embedding Sources
The RAG pipeline indexes the following content as vector embeddings:
| Source | Indexed Content | Update Frequency |
|---|---|---|
| Catalog metadata | Table names, column names, descriptions, data types | On schema change |
| Query examples | Successful SQL queries with their natural language questions | After each successful query |
| Business terms | Ontology definitions, term relationships | On ontology update |
| Semantic model | Metric definitions, dimension descriptions | On model publish |
| Documentation | Platform and data documentation | On documentation update |
RAG Query Flow
User Question: "What was revenue last quarter?"
|
v
Embedding Model: Convert question to vector
|
v
Qdrant: Search for similar vectors
| Filter: tenant_id = "acme-corp"
| Top-K: 5 most similar results
|
v
Retrieved Context:
- Table: orders (columns: amount, order_date, customer_id)
- Similar query: "SELECT SUM(amount) FROM orders WHERE ..."
- Metric: revenue = SUM(orders.amount)
|
v
SQLAgent: Generate SQL using retrieved contextCollection Structure
| Collection | Content | Embedding Dimension |
|---|---|---|
schema_metadata | Table and column descriptions | 1536 |
query_examples | Question-SQL pairs | 1536 |
business_terms | Ontology definitions | 1536 |
semantic_models | Metric definitions | 1536 |
Each vector entry includes a payload with tenant ID, creation timestamp, and source metadata.
LanceDB (Development)
LanceDB provides an embedded vector store for development:
| Aspect | Details |
|---|---|
| Deployment | Embedded in AI Service process |
| Storage | Local filesystem |
| Index | IVF-PQ for approximate search |
| Multi-tenancy | Separate tables per tenant |
LanceDB requires no additional infrastructure, making it suitable for local development and testing.
Embeddings Service REST API
The embeddings-service exposes these endpoints (all require JWT authentication):
| Endpoint | Method | Permission | Description |
|---|---|---|---|
/api/v1/collections | POST | embeddings:write | Create collection |
/api/v1/collections | GET | embeddings:read | List tenant collections |
/api/v1/collections/{name} | DELETE | embeddings:delete | Delete collection |
/api/v1/vectors/upsert | POST | embeddings:write | Batch upsert vectors |
/api/v1/vectors/search | POST | embeddings:read | Similarity search |
/api/v1/vectors/fetch | POST | embeddings:read | Fetch by IDs |
/api/v1/vectors | DELETE | embeddings:delete | Delete vectors |
/api/v1/admin/stats | GET | embeddings:admin | Cluster statistics |
Python Client
from matih_commons.clients.embeddings_client import EmbeddingsClient
client = EmbeddingsClient()
await client.create_collection(tenant_id, "my-coll", vector_size=768)
await client.upsert(tenant_id, "my-coll", vectors=[...])
results = await client.search(tenant_id, "my-coll", vector=[0.1, ...], top_k=10)Related Pages
- Agent Flow -- RAG in the agent pipeline
- Graph Stores -- Knowledge graph storage
- ML Infrastructure -- AI technology stack