ML Service Architecture
Production - Python/FastAPI - Port 8000 - Ray, MLflow, Feast integration
The ML Service provides machine learning operations (MLOps) capabilities including model training orchestration, experiment tracking, model versioning, feature store integration, and model serving. It bridges the gap between data scientists working in notebooks and production ML deployments.
2.4.C.1MLOps Pipeline
Data Preparation Training Deployment
+------------------+ +-------------------+ +-------------------+
| Feature Store | | Ray Cluster | | Model Registry |
| (Feast) |--->| (distributed |--->| (MLflow) |
| | | training) | | |
| - Feature defs | | - Hyperparameter | | - Version control |
| - Point-in-time | | tuning | | - Stage promotion |
| - Online serving | | - Distributed | | - Artifact store |
+------------------+ | data parallel | +--------+----------+
+-------------------+ |
v
+-------------------+
| Model Serving |
| (Ray Serve / |
| Triton) |
| |
| - A/B testing |
| - Canary deploy |
| - Auto-scaling |
+-------------------+2.4.C.2Infrastructure Integration
| Component | Technology | Purpose |
|---|---|---|
| Distributed training | Ray Train | Scales training across multiple workers |
| Hyperparameter tuning | Ray Tune | Bayesian optimization, grid/random search |
| Experiment tracking | MLflow | Metrics, parameters, artifacts per run |
| Model registry | MLflow Registry | Model versioning with stage gates |
| Feature store | Feast | Feature engineering and point-in-time joins |
| Model serving | Ray Serve / Triton | Low-latency inference endpoints |
| Artifact storage | MinIO (S3-compatible) | Model files, datasets, checkpoints |
| GPU scheduling | Kubernetes device plugin | GPU allocation for training and inference |
Ray Cluster Configuration
| Environment | Head Node | Worker Nodes | GPU Workers |
|---|---|---|---|
| Development | 1 (2 CPU, 4Gi) | 0 | 0 |
| Production | 1 (4 CPU, 8Gi) | 2-8 (auto-scale) | 1-4 (on demand) |
2.4.C.3Model Lifecycle
Models progress through defined stages:
Development --> Staging --> Production --> Archived
| | |
| | +--> Monitoring (drift detection)
| |
| +--> Validation (automated testing)
|
+--> Experiment tracking (MLflow)Stage Gates
| Transition | Requirements |
|---|---|
| Development --> Staging | All unit tests pass, metrics meet baseline |
| Staging --> Production | Integration tests pass, A/B test shows improvement, manual approval |
| Production --> Archived | New model version promoted, old version traffic drained |
2.4.C.4Key APIs
| Endpoint | Method | Description |
|---|---|---|
/api/v1/ml/experiments | GET/POST | Experiment management |
/api/v1/ml/experiments/{id}/runs | GET/POST | Training run management |
/api/v1/ml/models | GET/POST | Model registry |
/api/v1/ml/models/{id}/versions | GET | Model version history |
/api/v1/ml/models/{id}/deploy | POST | Deploy model for serving |
/api/v1/ml/models/{id}/promote | POST | Promote model to next stage |
/api/v1/ml/predict | POST | Run inference against deployed model |
/api/v1/ml/features | GET | Feature store catalog |
/api/v1/ml/features/serve | POST | Get features for inference |
/api/v1/ml/training/submit | POST | Submit distributed training job |
Related Sections
- ML Flow -- Training to serving pipeline
- Object Storage -- MinIO artifact storage
- ML Service Deep Dive -- Complete ML documentation