Chapter 13: ML Service and MLOps
The ML Service provides a comprehensive machine learning operations (MLOps) platform within MATIH, enabling end-to-end model lifecycle management from training through deployment and monitoring. Built on Python and FastAPI with 50+ API routers, it integrates with MLflow for model registry, Ray AIR for distributed training, Feast for feature management, and ONNX Runtime for optimized inference. This chapter covers the ML Service architecture, training pipelines, serving infrastructure, and operational monitoring.
What You Will Learn
By the end of this chapter, you will understand:
- The ML Service architecture including its 50+ router module organization, dependency graph, and integration with external ML frameworks
- Training pipelines with distributed training via Ray AIR, hyperparameter tuning, GPU management, and checkpoint management
- Model registry using MLflow for versioning, experiment tracking, model stage management, and artifact storage
- Model serving through ONNX Runtime optimization, Ray Serve for online inference, Triton Inference Server integration, and batch prediction
- Ensemble pipelines supporting voting, stacking, blending, bagging, and boosting methods with parallel inference
- A/B testing and canary deployments for safe model rollouts with automated traffic splitting and statistical analysis
- Feature store using Feast for online/offline feature serving, feature versioning, materialization, and streaming features
- Model monitoring with drift detection, performance tracking, retraining triggers, and automated alerting
- Distributed training with Ray AIR for data-parallel, model-parallel, and pipeline-parallel training strategies
Chapter Structure
| Section | Description | Audience |
|---|---|---|
| Architecture | Service architecture, module layout, framework integrations, and deployment topology | ML engineers, architects |
| Training Pipelines | Distributed training with Ray, hyperparameter tuning, GPU management, and checkpointing | ML engineers, data scientists |
| Model Registry | MLflow integration for experiment tracking, model versioning, and artifact management | ML engineers, MLOps engineers |
| Model Serving | ONNX Runtime, Ray Serve, Triton, batch prediction, and model loading | ML engineers, backend developers |
| Ensemble Pipelines | Ensemble methods, configuration, parallel inference, and meta-learning | ML engineers, data scientists |
| A/B Testing | Traffic splitting, canary deployments, statistical analysis, and safe rollout | ML engineers, product managers |
| Feature Store | Feast integration, online/offline stores, materialization, and streaming features | Data engineers, ML engineers |
| Model Monitoring | Drift detection, performance tracking, retraining triggers, and alerting | MLOps engineers |
| Distributed Training | Ray AIR orchestration, data-parallel and model-parallel strategies | ML engineers |
| API Reference | Complete REST API documentation for all ML Service endpoints | All developers |
ML Service at a Glance
The ML Service is a Python/FastAPI application that provides the machine learning infrastructure layer for the MATIH platform.
+--------------------------+
| ML Workbench (UI) |
| API Consumers |
+-----------+--------------+
|
+-----------v--------------+
| ML Service |
| (Port 8000) |
+-----------+--------------+
|
+-----------+-----------+-----------+-----------+
| | | | |
+-----v-----+ +--v------+ +-v--------+ +v--------+ +v---------+
| Training | | Model | | Feature | | Model | | Active |
| Pipeline | | Registry| | Store | | Serving | | Learning |
+-----+-----+ +--+------+ +--+-------+ +-+-------+ +----------+
| | | |
+-----v-----+ +--v------+ +-v--------+ +v---------+
| Ray AIR | | MLflow | | Feast | | ONNX RT |
| Cluster | | Server | | Server | | / Triton |
+-----------+ +---------+ +----------+ +----------+Key Numbers
| Metric | Value |
|---|---|
| Technology | Python 3.11+, FastAPI, uvicorn |
| Service port | 8000 |
| API routers | 50+ |
| Training framework | Ray AIR (Ray Train + Ray Tune) |
| Model registry | MLflow |
| Feature store | Feast |
| Model formats | scikit-learn, ONNX, PyTorch, TensorFlow |
| Inference runtime | ONNX Runtime, Ray Serve, Triton |
| Ensemble methods | Voting, stacking, blending, bagging, boosting |
| Monitoring | Drift detection, performance tracking, retraining triggers |
Key Source Files
The ML Service implementation is organized under data-plane/ml-service/src/:
| Path | Purpose |
|---|---|
main.py | FastAPI application entry point |
training/distributed_trainer.py | Ray Train distributed training |
training/hyperparameter_tuner.py | Ray Tune hyperparameter search |
training/gpu_manager.py | GPU resource management |
training/checkpoint_manager.py | Training checkpoint persistence |
training/job_manager.py | Training job lifecycle |
training/job_scheduler.py | Job scheduling and queuing |
training/validation_pipeline.py | Model validation |
registry/model_registry.py | MLflow model registry integration |
serving/prediction_service.py | Online prediction service |
serving/model_loader.py | Model loading and caching |
serving/ray_serve.py | Ray Serve deployment |
serving/triton_inference_service.py | Triton Inference Server |
models/ensemble.py | Ensemble configuration and methods |
models/prediction.py | Prediction request/response models |
features/feature_store.py | Feast feature store integration |
features/feast_online_store.py | Online feature serving |
features/feast_offline_store.py | Offline feature retrieval |
features/streaming_feature_service.py | Streaming feature ingestion |
monitoring/drift_detection_service.py | Data/concept drift detection |
monitoring/model_monitoring_service.py | Model performance monitoring |
monitoring/retraining_trigger_service.py | Automated retraining |
testing/model_testing_service.py | A/B testing and canary |
ray_air/orchestrator.py | Ray AIR orchestration |
active_learning/ | Active learning for label selection |
compliance/ | Model compliance and governance |
labeling/ | Data labeling management |
Integration Points
| Integration | Protocol | Purpose |
|---|---|---|
| MLflow | HTTP REST | Experiment tracking, model registry |
| Ray Cluster | Ray Client | Distributed training, hyperparameter tuning |
| Feast | gRPC/HTTP | Feature serving and materialization |
| ONNX Runtime | In-process | Optimized model inference |
| Triton | gRPC | High-performance inference server |
| MinIO/S3 | S3 API | Model artifact storage |
| PostgreSQL | SQL | Metadata storage |
| Redis | Redis protocol | Feature caching, model caching |
| Kafka | Kafka protocol | Training events, monitoring alerts |
| Prometheus | HTTP | Metrics export |
Getting Started
# Navigate to service directory
cd data-plane/ml-service
# Install dependencies
pip install -r requirements.txt
# Set environment variables
export DATABASE_URL=postgresql://matih:password@localhost:5432/matih_ml
export MLFLOW_TRACKING_URI=http://localhost:5000
export RAY_ADDRESS=ray://localhost:10001
export FEAST_REPO_PATH=/path/to/feast/repo
export S3_ENDPOINT=http://localhost:9000
export S3_ACCESS_KEY=minioadmin
export S3_SECRET_KEY=minioadmin
# Start the service
uvicorn src.main:app --host 0.0.0.0 --port 8000 --reload