ML and AI Infrastructure

The MATIH Platform provides a complete ML/AI infrastructure stack covering experiment tracking, distributed training, model serving, LLM inference, feature engineering, and vector search. This section documents each ML/AI technology, its role, and how it integrates with the platform.

ML/AI Stack Overview

Technology	Category	Purpose
MLflow	Experiment tracking	Model versioning, metrics logging, artifact storage
Ray	Distributed compute	Model training, hyperparameter tuning, serving
vLLM	LLM serving	High-throughput large language model inference
Triton	Model serving	GPU-optimized inference server
LangChain	LLM orchestration	LLM integration, chain composition
LangGraph	Agent orchestration	Multi-agent state machine for the AI Service
Qdrant	Vector database	RAG embeddings, semantic similarity search
LanceDB	Vector database	Lightweight vector search for development
Feast	Feature store	Feature engineering and serving
JupyterHub	Notebooks	Interactive development environment

MLflow

MLflow provides the experiment tracking and model registry backbone:

Feature	Details
Experiment tracking	Log parameters, metrics, and artifacts per training run
Model registry	Version, stage, and promote models (Staging, Production, Archived)
Artifact storage	Model files stored in MinIO (S3-compatible)
Run comparison	Side-by-side metric comparison across experiments
REST API	Programmatic access for automation

Ray

Ray provides the distributed compute layer for ML workloads:

Component	Purpose
Ray Train	Distributed training across GPU/CPU nodes
Ray Tune	Hyperparameter optimization with scheduling algorithms
Ray Serve	Model serving with request batching and auto-scaling
Ray Data	Distributed data preprocessing pipelines

Ray runs as a Kubernetes-native cluster using the KubeRay operator, scaling dynamically based on workload demand.

vLLM

vLLM provides high-throughput LLM inference:

Feature	Details
PagedAttention	Efficient memory management for long sequences
Continuous batching	Dynamic request batching for throughput optimization
Model support	Llama, Mistral, GPT-family, and other open models
API compatibility	OpenAI-compatible REST API

LangChain and LangGraph

The AI Service uses LangChain and LangGraph for its multi-agent orchestrator:

Technology	Purpose
LangChain	LLM provider abstraction, prompt templates, RAG pipelines
LangGraph	State machine orchestration for multi-agent workflows

The agent orchestrator manages four specialized agents: RouterAgent, SQLAgent, AnalysisAgent, and VizAgent.

Vector Databases

Two vector database options serve different deployment scenarios:

Technology	Use Case	Features
Qdrant	Production vector search	Distributed, HNSW indexing, filtered search, REST/gRPC API
LanceDB	Development and testing	Embedded, no server required, lightweight

Vector stores power the RAG (Retrieval-Augmented Generation) pipeline, storing embeddings of schema metadata, query examples, and business terminology to improve text-to-SQL accuracy.

Feast

Feast manages the feature engineering and serving lifecycle:

Feature	Details
Feature definitions	Declarative feature specifications in Python
Online store	Low-latency feature retrieval for real-time inference
Offline store	Historical feature access for training dataset creation
Point-in-time joins	Temporally correct feature values at training time

ML Engineer Persona -- ML Engineer workflow
ML Flow -- Model training and serving lifecycle
Compute Engines -- Ray, Spark, Flink
Data Stores: Vector Stores -- Vector database architecture

Data Infrastructure Orchestration & IaC