MLflow Integration
The MATIH ML Service uses MLflow as the backend for experiment tracking and model registry. The ExperimentTracker class provides a high-level wrapper that adds tenant isolation, structured logging, and integration with the broader MATIH ecosystem.
Configuration
| Environment Variable | Default | Description |
|---|---|---|
MLFLOW_TRACKING_URI | http://localhost:5000 | MLflow tracking server endpoint |
MLFLOW_ARTIFACT_ROOT | s3://matih-mlflow-artifacts | Default artifact storage location |
MLFLOW_REGISTRY_URI | Same as tracking URI | Model registry endpoint |
ExperimentTracker Initialization
from src.tracking.experiment_tracker import ExperimentTracker
tracker = ExperimentTracker(
tracking_uri="http://mlflow:5000",
default_artifact_root="s3://matih-mlflow-artifacts",
registry_uri=None, # Defaults to tracking_uri
)Under the hood, the tracker configures the global MLflow client:
mlflow.set_tracking_uri(self.tracking_uri)
if self.registry_uri != self.tracking_uri:
mlflow.set_registry_uri(self.registry_uri)
self._client = MlflowClient(tracking_uri=self.tracking_uri)Context Manager Pattern
The recommended pattern uses start_run as a context manager, which automatically handles run lifecycle:
with tracker.start_run(
experiment_name="my-experiment",
run_config=RunConfig(
run_name="training-run-1",
tenant_id="acme-corp",
user_id="alice@acme.com",
job_id="job-123",
),
) as run:
run.log_params({"learning_rate": 0.01, "epochs": 100})
for epoch in range(100):
loss = train_epoch()
run.log_metric("loss", loss, step=epoch)The context manager creates the MLflow run with appropriate tags including matih.tenant_id, matih.user_id, and matih.job_id.
ActiveRun Wrapper
The ActiveRun class wraps the native MLflow run with fluent methods:
class ActiveRun:
@property
def run_id(self) -> str: ...
@property
def experiment_id(self) -> str: ...
@property
def artifact_uri(self) -> str: ...
def log_params(self, params) -> "ActiveRun": ...
def log_metrics(self, metrics, step=None) -> "ActiveRun": ...
def log_artifact(self, local_path, artifact_path=None) -> "ActiveRun": ...
def log_dict(self, dictionary, artifact_file) -> "ActiveRun": ...
def set_tag(self, key, value) -> "ActiveRun": ...
def get_elapsed_time(self) -> float: ...All methods return self for chaining:
run.log_params(params).log_metrics(metrics).set_tag("status", "complete")Singleton Access
A global singleton is available for shared access:
from src.tracking.experiment_tracker import get_experiment_tracker
tracker = get_experiment_tracker()MLflow Compatibility
The MATIH experiment tracking API is compatible with MLflow's standard interfaces, enabling migration from existing MLflow setups:
| MLflow Concept | MATIH Equivalent |
|---|---|
| Experiment | Experiment (with tenant prefix) |
| Run | Run (with tenant/user/job tags) |
| Parameters | Parameters (string-converted) |
| Metrics | Metrics (with step tracking) |
| Artifacts | Artifacts (S3 storage) |
| Model Registry | Model lifecycle manager |
| Tags | Tags (with matih.* namespace) |
Source Files
| File | Path |
|---|---|
| ExperimentTracker | data-plane/ml-service/src/tracking/experiment_tracker.py |
| MLflow Deployment Service | data-plane/ml-service/src/tracking/mlflow_deployment_service.py |
| Model Registry Client | data-plane/ml-service/src/tracking/model_registry_client.py |