Chapter 13: ML Service and MLOps

The ML Service provides a comprehensive machine learning operations (MLOps) platform within MATIH, enabling end-to-end model lifecycle management from training through deployment and monitoring. Built on Python and FastAPI with 50+ API routers, it integrates with MLflow for model registry, Ray AIR for distributed training, Feast for feature management, and ONNX Runtime for optimized inference. This chapter covers the ML Service architecture, training pipelines, serving infrastructure, and operational monitoring.

What You Will Learn

By the end of this chapter, you will understand:

The ML Service architecture including its 50+ router module organization, dependency graph, and integration with external ML frameworks
Training pipelines with distributed training via Ray AIR, hyperparameter tuning, GPU management, and checkpoint management
Model registry using MLflow for versioning, experiment tracking, model stage management, and artifact storage
Model serving through ONNX Runtime optimization, Ray Serve for online inference, Triton Inference Server integration, and batch prediction
Ensemble pipelines supporting voting, stacking, blending, bagging, and boosting methods with parallel inference
A/B testing and canary deployments for safe model rollouts with automated traffic splitting and statistical analysis
Feature store using Feast for online/offline feature serving, feature versioning, materialization, and streaming features
Model monitoring with drift detection, performance tracking, retraining triggers, and automated alerting
Distributed training with Ray AIR for data-parallel, model-parallel, and pipeline-parallel training strategies

Chapter Structure

Section	Description	Audience
Architecture	Service architecture, module layout, framework integrations, and deployment topology	ML engineers, architects
Training Pipelines	Distributed training with Ray, hyperparameter tuning, GPU management, and checkpointing	ML engineers, data scientists
Model Registry	MLflow integration for experiment tracking, model versioning, and artifact management	ML engineers, MLOps engineers
Model Serving	ONNX Runtime, Ray Serve, Triton, batch prediction, and model loading	ML engineers, backend developers
Ensemble Pipelines	Ensemble methods, configuration, parallel inference, and meta-learning	ML engineers, data scientists
A/B Testing	Traffic splitting, canary deployments, statistical analysis, and safe rollout	ML engineers, product managers
Feature Store	Feast integration, online/offline stores, materialization, and streaming features	Data engineers, ML engineers
Model Monitoring	Drift detection, performance tracking, retraining triggers, and alerting	MLOps engineers
Distributed Training	Ray AIR orchestration, data-parallel and model-parallel strategies	ML engineers
API Reference	Complete REST API documentation for all ML Service endpoints	All developers

ML Service at a Glance

The ML Service is a Python/FastAPI application that provides the machine learning infrastructure layer for the MATIH platform.

                    +--------------------------+
                    |   ML Workbench (UI)      |
                    |   API Consumers          |
                    +-----------+--------------+
                                |
                    +-----------v--------------+
                    |      ML Service          |
                    |     (Port 8000)          |
                    +-----------+--------------+
                                |
        +-----------+-----------+-----------+-----------+
        |           |           |           |           |
  +-----v-----+ +--v------+ +-v--------+ +v--------+ +v---------+
  | Training  | | Model   | | Feature  | | Model   | | Active   |
  | Pipeline  | | Registry| | Store    | | Serving | | Learning |
  +-----+-----+ +--+------+ +--+-------+ +-+-------+ +----------+
        |           |           |           |
  +-----v-----+ +--v------+ +-v--------+ +v---------+
  | Ray AIR   | | MLflow  | | Feast    | | ONNX RT  |
  | Cluster   | | Server  | | Server   | | / Triton |
  +-----------+ +---------+ +----------+ +----------+

Key Numbers

Metric	Value
Technology	Python 3.11+, FastAPI, uvicorn
Service port	8000
API routers	50+
Training framework	Ray AIR (Ray Train + Ray Tune)
Model registry	MLflow
Feature store	Feast
Model formats	scikit-learn, ONNX, PyTorch, TensorFlow
Inference runtime	ONNX Runtime, Ray Serve, Triton
Ensemble methods	Voting, stacking, blending, bagging, boosting
Monitoring	Drift detection, performance tracking, retraining triggers

Key Source Files

The ML Service implementation is organized under data-plane/ml-service/src/:

Path	Purpose
`main.py`	FastAPI application entry point
`training/distributed_trainer.py`	Ray Train distributed training
`training/hyperparameter_tuner.py`	Ray Tune hyperparameter search
`training/gpu_manager.py`	GPU resource management
`training/checkpoint_manager.py`	Training checkpoint persistence
`training/job_manager.py`	Training job lifecycle
`training/job_scheduler.py`	Job scheduling and queuing
`training/validation_pipeline.py`	Model validation
`registry/model_registry.py`	MLflow model registry integration
`serving/prediction_service.py`	Online prediction service
`serving/model_loader.py`	Model loading and caching
`serving/ray_serve.py`	Ray Serve deployment
`serving/triton_inference_service.py`	Triton Inference Server
`models/ensemble.py`	Ensemble configuration and methods
`models/prediction.py`	Prediction request/response models
`features/feature_store.py`	Feast feature store integration
`features/feast_online_store.py`	Online feature serving
`features/feast_offline_store.py`	Offline feature retrieval
`features/streaming_feature_service.py`	Streaming feature ingestion
`monitoring/drift_detection_service.py`	Data/concept drift detection
`monitoring/model_monitoring_service.py`	Model performance monitoring
`monitoring/retraining_trigger_service.py`	Automated retraining
`testing/model_testing_service.py`	A/B testing and canary
`ray_air/orchestrator.py`	Ray AIR orchestration
`active_learning/`	Active learning for label selection
`compliance/`	Model compliance and governance
`labeling/`	Data labeling management

Integration Points

Integration	Protocol	Purpose
MLflow	HTTP REST	Experiment tracking, model registry
Ray Cluster	Ray Client	Distributed training, hyperparameter tuning
Feast	gRPC/HTTP	Feature serving and materialization
ONNX Runtime	In-process	Optimized model inference
Triton	gRPC	High-performance inference server
MinIO/S3	S3 API	Model artifact storage
PostgreSQL	SQL	Metadata storage
Redis	Redis protocol	Feature caching, model caching
Kafka	Kafka protocol	Training events, monitoring alerts
Prometheus	HTTP	Metrics export

Getting Started

# Navigate to service directory
cd data-plane/ml-service
 
# Install dependencies
pip install -r requirements.txt
 
# Set environment variables
export DATABASE_URL=postgresql://matih:password@localhost:5432/matih_ml
export MLFLOW_TRACKING_URI=http://localhost:5000
export RAY_ADDRESS=ray://localhost:10001
export FEAST_REPO_PATH=/path/to/feast/repo
export S3_ENDPOINT=http://localhost:9000
export S3_ACCESS_KEY=minioadmin
export S3_SECRET_KEY=minioadmin
 
# Start the service
uvicorn src.main:app --host 0.0.0.0 --port 8000 --reload

Streaming Service Architecture