Experiment Tracking
The Experiment Tracking integration provides a conversational interface for managing ML experiments, comparing runs, and analyzing training metrics. It wraps MLflow Tracking functionality through the ML Service, enabling users to query experiment data and visualize results through natural language.
Tracking Architecture
Experiment tracking data flows through the ML Service to the MLflow backend:
AI Service (Tracking API) --> ML Service --> MLflow Tracking Server --> PostgreSQL (metrics store)
--> MinIO/S3 (artifact store)Core Concepts
| Concept | Description |
|---|---|
| Experiment | A named collection of related training runs |
| Run | A single training execution with parameters, metrics, and artifacts |
| Metric | A numeric value logged during training (loss, accuracy, F1) |
| Parameter | A hyperparameter value used in the run |
| Artifact | A file produced by the run (model, plots, data samples) |
| Tag | A key-value label for organizing and filtering runs |
Create Experiment
POST /api/v1/ml/experiments{
"name": "churn-prediction-q1",
"description": "Customer churn prediction experiments for Q1 2025",
"tags": {
"project": "customer-retention",
"team": "data-science"
}
}List Experiments
GET /api/v1/ml/experiments?tenant_id=acme-corpResponse
{
"experiments": [
{
"id": "exp-001",
"name": "churn-prediction-q1",
"run_count": 24,
"best_metric": {"f1_score": 0.912},
"status": "active",
"created_at": "2025-01-15T08:00:00Z"
}
]
}List Runs
GET /api/v1/ml/experiments/:experiment_id/runs?sort_by=metrics.f1_score&order=descResponse
{
"runs": [
{
"run_id": "run-017",
"status": "completed",
"parameters": {
"algorithm": "xgboost",
"n_estimators": 200,
"learning_rate": 0.05
},
"metrics": {
"f1_score": 0.912,
"accuracy": 0.95,
"training_loss": 0.082
},
"duration_seconds": 342,
"created_at": "2025-03-15T10:00:00Z"
}
]
}Compare Runs
Compares multiple runs side by side for metric and parameter analysis:
POST /api/v1/ml/experiments/:experiment_id/compare{
"run_ids": ["run-015", "run-016", "run-017"],
"metrics": ["f1_score", "accuracy", "auc_roc"],
"parameters": ["n_estimators", "learning_rate", "max_depth"]
}Response
{
"comparison": {
"runs": [
{"run_id": "run-015", "f1_score": 0.88, "accuracy": 0.93},
{"run_id": "run-016", "f1_score": 0.90, "accuracy": 0.94},
{"run_id": "run-017", "f1_score": 0.912, "accuracy": 0.95}
],
"best_run": "run-017",
"parameter_impact": {
"learning_rate": {"correlation_with_f1": -0.72},
"n_estimators": {"correlation_with_f1": 0.85}
}
}
}Metric History
Retrieves metric values logged over training steps for a specific run:
GET /api/v1/ml/experiments/:experiment_id/runs/:run_id/metrics/:metric_nameResponse
{
"metric": "training_loss",
"steps": [
{"step": 0, "value": 0.693, "timestamp": "2025-03-15T10:00:00Z"},
{"step": 10, "value": 0.412, "timestamp": "2025-03-15T10:01:00Z"},
{"step": 20, "value": 0.185, "timestamp": "2025-03-15T10:02:00Z"},
{"step": 30, "value": 0.082, "timestamp": "2025-03-15T10:03:00Z"}
]
}Conversational Queries
Users can query experiment data through natural language:
| User Query | Resolved Action |
|---|---|
| "Show me all experiments for churn prediction" | List experiments filtered by tag |
| "Which run had the highest F1 score?" | Sort runs by metric descending |
| "Compare the last 3 runs" | Side-by-side metric comparison |
| "Plot the training loss curve for run 17" | Metric history visualization |
Configuration
| Environment Variable | Default | Description |
|---|---|---|
MLFLOW_TRACKING_URI | http://mlflow:5000 | MLflow server URL |
TRACKING_MAX_RUNS_PER_QUERY | 100 | Max runs returned per query |
TRACKING_METRIC_HISTORY_LIMIT | 1000 | Max metric history steps |