Experiment Tracking
The MATIH ML Service provides a comprehensive experiment tracking system built on MLflow, enabling data scientists and ML engineers to organize, track, and compare machine learning experiments across the platform. Every experiment is tenant-isolated and integrates with the broader MLOps pipeline.
What is Experiment Tracking?
Experiment tracking captures the full context of each machine learning training run: hyperparameters, metrics over time, model artifacts, code versions, and environment details. This provides reproducibility, comparability, and auditability for all ML work.
Architecture
+-----------------------+
| ML Workbench UI |
+----------+------------+
|
+----------v------------+
| Experiments API |
| /api/v1/experiments |
+----------+------------+
|
+----------v------------+ +------------------+
| ExperimentTracker +-----> MLflow Server |
| (experiment_tracker.py) | (Tracking URI) |
+----------+------------+ +------------------+
|
+----------v------------+
| Artifact Storage |
| (S3 / MinIO) |
+-----------------------+Key Components
| Component | File | Purpose |
|---|---|---|
| Experiments API | src/api/experiments.py | REST endpoints for CRUD, run management, metrics logging |
| ExperimentTracker | src/tracking/experiment_tracker.py | MLflow client wrapper with tenant isolation |
| EnhancedExperimentService | src/tracking/enhanced_experiment_service.py | Advanced tracking with system metrics |
| ArtifactManager | src/tracking/artifact_manager.py | Artifact upload, storage, retrieval |
Core Data Models
ExperimentCreate
class ExperimentCreate(BaseModel):
"""Request to create an experiment."""
name: str = Field(..., min_length=1, max_length=255)
description: str = Field(default="", max_length=1000)
artifact_location: Optional[str] = None
tags: dict[str, str] = Field(default_factory=dict)ExperimentResponse
class ExperimentResponse(BaseModel):
"""Experiment response."""
id: str
name: str
description: str
artifact_location: str
lifecycle_stage: str # "active" or "deleted"
created_at: str
last_updated: str
tags: dict[str, str]RunResponse
class RunResponse(BaseModel):
"""Run response."""
id: str
experiment_id: str
name: Optional[str]
status: str # "running", "finished", "failed", "killed"
start_time: str
end_time: Optional[str]
duration_seconds: Optional[float]
tags: dict[str, str]
params: dict[str, str]
metrics: dict[str, float]
artifacts: list[str]Experiment Lifecycle
Experiments follow a simple lifecycle:
- Create -- Define an experiment with a name, description, and tags
- Run -- Create runs within the experiment to track individual training iterations
- Log -- Record parameters, metrics (at each step), and artifacts to runs
- Compare -- Compare metrics across runs to identify best configurations
- Archive -- Soft-delete experiments that are no longer active
Tenant Isolation
All experiments are scoped by tenant. The ExperimentTracker prefixes experiment names with the tenant ID:
def get_or_create_experiment(self, name: str, tenant_id: Optional[str] = None) -> str:
if tenant_id:
full_name = f"{tenant_id}/{name}"
else:
full_name = name
experiment = self._client.get_experiment_by_name(full_name)
if experiment:
return experiment.experiment_id
return self.create_experiment(ExperimentConfig(name=full_name, tenant_id=tenant_id))Tenant tags (matih.tenant_id) are automatically applied to all experiments and runs, enabling filtered queries across the MLflow backend.
Quick Start
# Create an experiment
curl -X POST http://localhost:8000/api/v1/experiments \
-H "Content-Type: application/json" \
-H "X-Tenant-ID: acme-corp" \
-d '{
"name": "churn-prediction-v2",
"description": "Customer churn prediction with gradient boosting",
"tags": {"team": "data-science", "domain": "retention"}
}'
# Create a run
curl -X POST http://localhost:8000/api/v1/experiments/runs \
-H "Content-Type: application/json" \
-H "X-Tenant-ID: acme-corp" \
-d '{
"experiment_id": "<experiment-id>",
"name": "xgboost-baseline",
"tags": {"model_type": "xgboost"}
}'Section Contents
| Page | Description |
|---|---|
| Creating Experiments | Experiment creation, naming conventions, metadata management |
| Managing Runs | Run lifecycle, metrics logging, parameter tracking |
| Comparing Runs | Multi-run comparison and visualization |
| Artifacts | Artifact upload, storage backends, retrieval |
| MLflow Integration | MLflow compatibility and migration guide |
| API Reference | Complete endpoint documentation |