ML and AI Infrastructure
The MATIH Platform provides a complete ML/AI infrastructure stack covering experiment tracking, distributed training, model serving, LLM inference, feature engineering, and vector search. This section documents each ML/AI technology, its role, and how it integrates with the platform.
ML/AI Stack Overview
| Technology | Category | Purpose |
|---|---|---|
| MLflow | Experiment tracking | Model versioning, metrics logging, artifact storage |
| Ray | Distributed compute | Model training, hyperparameter tuning, serving |
| vLLM | LLM serving | High-throughput large language model inference |
| Triton | Model serving | GPU-optimized inference server |
| LangChain | LLM orchestration | LLM integration, chain composition |
| LangGraph | Agent orchestration | Multi-agent state machine for the AI Service |
| Qdrant | Vector database | RAG embeddings, semantic similarity search |
| LanceDB | Vector database | Lightweight vector search for development |
| Feast | Feature store | Feature engineering and serving |
| JupyterHub | Notebooks | Interactive development environment |
MLflow
MLflow provides the experiment tracking and model registry backbone:
| Feature | Details |
|---|---|
| Experiment tracking | Log parameters, metrics, and artifacts per training run |
| Model registry | Version, stage, and promote models (Staging, Production, Archived) |
| Artifact storage | Model files stored in MinIO (S3-compatible) |
| Run comparison | Side-by-side metric comparison across experiments |
| REST API | Programmatic access for automation |
Ray
Ray provides the distributed compute layer for ML workloads:
| Component | Purpose |
|---|---|
| Ray Train | Distributed training across GPU/CPU nodes |
| Ray Tune | Hyperparameter optimization with scheduling algorithms |
| Ray Serve | Model serving with request batching and auto-scaling |
| Ray Data | Distributed data preprocessing pipelines |
Ray runs as a Kubernetes-native cluster using the KubeRay operator, scaling dynamically based on workload demand.
vLLM
vLLM provides high-throughput LLM inference:
| Feature | Details |
|---|---|
| PagedAttention | Efficient memory management for long sequences |
| Continuous batching | Dynamic request batching for throughput optimization |
| Model support | Llama, Mistral, GPT-family, and other open models |
| API compatibility | OpenAI-compatible REST API |
LangChain and LangGraph
The AI Service uses LangChain and LangGraph for its multi-agent orchestrator:
| Technology | Purpose |
|---|---|
| LangChain | LLM provider abstraction, prompt templates, RAG pipelines |
| LangGraph | State machine orchestration for multi-agent workflows |
The agent orchestrator manages four specialized agents: RouterAgent, SQLAgent, AnalysisAgent, and VizAgent.
Vector Databases
Two vector database options serve different deployment scenarios:
| Technology | Use Case | Features |
|---|---|---|
| Qdrant | Production vector search | Distributed, HNSW indexing, filtered search, REST/gRPC API |
| LanceDB | Development and testing | Embedded, no server required, lightweight |
Vector stores power the RAG (Retrieval-Augmented Generation) pipeline, storing embeddings of schema metadata, query examples, and business terminology to improve text-to-SQL accuracy.
Feast
Feast manages the feature engineering and serving lifecycle:
| Feature | Details |
|---|---|
| Feature definitions | Declarative feature specifications in Python |
| Online store | Low-latency feature retrieval for real-time inference |
| Offline store | Historical feature access for training dataset creation |
| Point-in-time joins | Temporally correct feature values at training time |
Related Pages
- ML Engineer Persona -- ML Engineer workflow
- ML Flow -- Model training and serving lifecycle
- Compute Engines -- Ray, Spark, Flink
- Data Stores: Vector Stores -- Vector database architecture