Experiment Tracking

The Experiment Tracking integration provides a conversational interface for managing ML experiments, comparing runs, and analyzing training metrics. It wraps MLflow Tracking functionality through the ML Service, enabling users to query experiment data and visualize results through natural language.

Tracking Architecture

Experiment tracking data flows through the ML Service to the MLflow backend:

AI Service (Tracking API) --> ML Service --> MLflow Tracking Server --> PostgreSQL (metrics store)
                                                                    --> MinIO/S3 (artifact store)

Core Concepts

Concept	Description
Experiment	A named collection of related training runs
Run	A single training execution with parameters, metrics, and artifacts
Metric	A numeric value logged during training (loss, accuracy, F1)
Parameter	A hyperparameter value used in the run
Artifact	A file produced by the run (model, plots, data samples)
Tag	A key-value label for organizing and filtering runs

Create Experiment

POST /api/v1/ml/experiments

{
  "name": "churn-prediction-q1",
  "description": "Customer churn prediction experiments for Q1 2025",
  "tags": {
    "project": "customer-retention",
    "team": "data-science"
  }
}

List Experiments

GET /api/v1/ml/experiments?tenant_id=acme-corp

Response

{
  "experiments": [
    {
      "id": "exp-001",
      "name": "churn-prediction-q1",
      "run_count": 24,
      "best_metric": {"f1_score": 0.912},
      "status": "active",
      "created_at": "2025-01-15T08:00:00Z"
    }
  ]
}

List Runs

GET /api/v1/ml/experiments/:experiment_id/runs?sort_by=metrics.f1_score&order=desc

Response

{
  "runs": [
    {
      "run_id": "run-017",
      "status": "completed",
      "parameters": {
        "algorithm": "xgboost",
        "n_estimators": 200,
        "learning_rate": 0.05
      },
      "metrics": {
        "f1_score": 0.912,
        "accuracy": 0.95,
        "training_loss": 0.082
      },
      "duration_seconds": 342,
      "created_at": "2025-03-15T10:00:00Z"
    }
  ]
}

Compare Runs

Compares multiple runs side by side for metric and parameter analysis:

POST /api/v1/ml/experiments/:experiment_id/compare

{
  "run_ids": ["run-015", "run-016", "run-017"],
  "metrics": ["f1_score", "accuracy", "auc_roc"],
  "parameters": ["n_estimators", "learning_rate", "max_depth"]
}

Response

{
  "comparison": {
    "runs": [
      {"run_id": "run-015", "f1_score": 0.88, "accuracy": 0.93},
      {"run_id": "run-016", "f1_score": 0.90, "accuracy": 0.94},
      {"run_id": "run-017", "f1_score": 0.912, "accuracy": 0.95}
    ],
    "best_run": "run-017",
    "parameter_impact": {
      "learning_rate": {"correlation_with_f1": -0.72},
      "n_estimators": {"correlation_with_f1": 0.85}
    }
  }
}

Metric History

Retrieves metric values logged over training steps for a specific run:

GET /api/v1/ml/experiments/:experiment_id/runs/:run_id/metrics/:metric_name

Response

{
  "metric": "training_loss",
  "steps": [
    {"step": 0, "value": 0.693, "timestamp": "2025-03-15T10:00:00Z"},
    {"step": 10, "value": 0.412, "timestamp": "2025-03-15T10:01:00Z"},
    {"step": 20, "value": 0.185, "timestamp": "2025-03-15T10:02:00Z"},
    {"step": 30, "value": 0.082, "timestamp": "2025-03-15T10:03:00Z"}
  ]
}

Conversational Queries

Users can query experiment data through natural language:

User Query	Resolved Action
"Show me all experiments for churn prediction"	List experiments filtered by tag
"Which run had the highest F1 score?"	Sort runs by metric descending
"Compare the last 3 runs"	Side-by-side metric comparison
"Plot the training loss curve for run 17"	Metric history visualization

Configuration

Environment Variable	Default	Description
`MLFLOW_TRACKING_URI`	`http://mlflow:5000`	MLflow server URL
`TRACKING_MAX_RUNS_PER_QUERY`	`100`	Max runs returned per query
`TRACKING_METRIC_HISTORY_LIMIT`	`1000`	Max metric history steps

Model Registry Exploratory Data Analysis