MATIH Platform is in active MVP development. Documentation reflects current implementation status.
13. ML Service & MLOps
Governance & Compliance
Model Debugging

Model Debugging

The Model Debugging module provides tools for identifying, diagnosing, and resolving model performance issues. It enables error analysis across data slices, counterfactual explanations, prediction confidence analysis, and feature sensitivity testing to help ML engineers understand and fix model failures.


Debugging Capabilities

CapabilityDescriptionUse Case
Error AnalysisIdentify data slices with high error ratesFind underperforming segments
Counterfactual ExplanationsMinimal feature changes to flip predictionUnderstand decision boundaries
Confidence AnalysisDistribution of prediction confidence scoresDetect uncertain predictions
Feature SensitivityImpact of small perturbations on predictionsTest model robustness
Slice PerformanceMetrics across user-defined data slicesEvaluate fairness and coverage
Adversarial TestingTest resilience to adversarial inputsSecurity and robustness

Run Diagnostics

POST /api/v1/governance/debug
{
  "model_id": "model-xyz789",
  "dataset": {
    "source": "sql",
    "query": "SELECT * FROM ml_features.customer_churn"
  },
  "target_column": "churned",
  "diagnostics": ["error_analysis", "confidence_analysis", "slice_performance"],
  "slicing_columns": ["contract_type", "tenure_bucket", "region"]
}

Response

{
  "model_id": "model-xyz789",
  "diagnostics": {
    "error_analysis": {
      "overall_error_rate": 0.06,
      "worst_slices": [
        {
          "slice": "contract_type = 'two-year' AND tenure_bucket = '0-6mo'",
          "error_rate": 0.23,
          "sample_count": 45,
          "dominant_error": "false_negative"
        }
      ]
    },
    "confidence_analysis": {
      "mean_confidence": 0.84,
      "low_confidence_count": 320,
      "low_confidence_threshold": 0.6,
      "confidence_distribution": {
        "0.0-0.2": 15,
        "0.2-0.4": 48,
        "0.4-0.6": 257,
        "0.6-0.8": 1230,
        "0.8-1.0": 3450
      }
    }
  }
}

Counterfactual Explanations

Generate minimal changes that would flip a prediction:

POST /api/v1/governance/debug/counterfactual
{
  "model_id": "model-xyz789",
  "instance": {
    "tenure": 3,
    "monthly_charges": 89.50,
    "contract_type": "month-to-month"
  },
  "desired_outcome": 0,
  "max_changes": 2
}

Response

{
  "original_prediction": 1,
  "original_probability": 0.87,
  "counterfactuals": [
    {
      "changes": {"contract_type": "one-year"},
      "new_prediction": 0,
      "new_probability": 0.38,
      "explanation": "Switching to a one-year contract reduces churn probability from 87% to 38%"
    },
    {
      "changes": {"tenure": 12, "contract_type": "one-year"},
      "new_prediction": 0,
      "new_probability": 0.15,
      "explanation": "Increasing tenure to 12 months and switching to yearly contract reduces churn to 15%"
    }
  ]
}

Slice Performance

Compute model metrics across data slices to identify underperforming segments:

POST /api/v1/governance/debug/slices
{
  "model_id": "model-xyz789",
  "dataset": {
    "source": "sql",
    "query": "SELECT * FROM ml_features.customer_churn"
  },
  "slicing_columns": ["contract_type", "region"],
  "metrics": ["accuracy", "f1_score", "precision", "recall"]
}

Response

{
  "slices": [
    {
      "slice": "contract_type = 'month-to-month'",
      "sample_count": 3000,
      "metrics": {"accuracy": 0.91, "f1_score": 0.88}
    },
    {
      "slice": "contract_type = 'two-year'",
      "sample_count": 1200,
      "metrics": {"accuracy": 0.97, "f1_score": 0.72}
    }
  ]
}

Feature Sensitivity

Test how small perturbations to each feature affect predictions:

POST /api/v1/governance/debug/sensitivity
{
  "model_id": "model-xyz789",
  "instance": {"tenure": 12, "monthly_charges": 65.00},
  "perturbation_range": 0.1
}

Configuration

Environment VariableDefaultDescription
DEBUG_MAX_SAMPLES10000Maximum samples for analysis
DEBUG_COUNTERFACTUAL_LIMIT5Max counterfactuals to generate
DEBUG_CONFIDENCE_THRESHOLD0.6Low confidence boundary
DEBUG_SLICE_MIN_SAMPLES30Minimum samples per slice