Model Debugging
The Model Debugging module provides tools for identifying, diagnosing, and resolving model performance issues. It enables error analysis across data slices, counterfactual explanations, prediction confidence analysis, and feature sensitivity testing to help ML engineers understand and fix model failures.
Debugging Capabilities
| Capability | Description | Use Case |
|---|---|---|
| Error Analysis | Identify data slices with high error rates | Find underperforming segments |
| Counterfactual Explanations | Minimal feature changes to flip prediction | Understand decision boundaries |
| Confidence Analysis | Distribution of prediction confidence scores | Detect uncertain predictions |
| Feature Sensitivity | Impact of small perturbations on predictions | Test model robustness |
| Slice Performance | Metrics across user-defined data slices | Evaluate fairness and coverage |
| Adversarial Testing | Test resilience to adversarial inputs | Security and robustness |
Run Diagnostics
POST /api/v1/governance/debug{
"model_id": "model-xyz789",
"dataset": {
"source": "sql",
"query": "SELECT * FROM ml_features.customer_churn"
},
"target_column": "churned",
"diagnostics": ["error_analysis", "confidence_analysis", "slice_performance"],
"slicing_columns": ["contract_type", "tenure_bucket", "region"]
}Response
{
"model_id": "model-xyz789",
"diagnostics": {
"error_analysis": {
"overall_error_rate": 0.06,
"worst_slices": [
{
"slice": "contract_type = 'two-year' AND tenure_bucket = '0-6mo'",
"error_rate": 0.23,
"sample_count": 45,
"dominant_error": "false_negative"
}
]
},
"confidence_analysis": {
"mean_confidence": 0.84,
"low_confidence_count": 320,
"low_confidence_threshold": 0.6,
"confidence_distribution": {
"0.0-0.2": 15,
"0.2-0.4": 48,
"0.4-0.6": 257,
"0.6-0.8": 1230,
"0.8-1.0": 3450
}
}
}
}Counterfactual Explanations
Generate minimal changes that would flip a prediction:
POST /api/v1/governance/debug/counterfactual{
"model_id": "model-xyz789",
"instance": {
"tenure": 3,
"monthly_charges": 89.50,
"contract_type": "month-to-month"
},
"desired_outcome": 0,
"max_changes": 2
}Response
{
"original_prediction": 1,
"original_probability": 0.87,
"counterfactuals": [
{
"changes": {"contract_type": "one-year"},
"new_prediction": 0,
"new_probability": 0.38,
"explanation": "Switching to a one-year contract reduces churn probability from 87% to 38%"
},
{
"changes": {"tenure": 12, "contract_type": "one-year"},
"new_prediction": 0,
"new_probability": 0.15,
"explanation": "Increasing tenure to 12 months and switching to yearly contract reduces churn to 15%"
}
]
}Slice Performance
Compute model metrics across data slices to identify underperforming segments:
POST /api/v1/governance/debug/slices{
"model_id": "model-xyz789",
"dataset": {
"source": "sql",
"query": "SELECT * FROM ml_features.customer_churn"
},
"slicing_columns": ["contract_type", "region"],
"metrics": ["accuracy", "f1_score", "precision", "recall"]
}Response
{
"slices": [
{
"slice": "contract_type = 'month-to-month'",
"sample_count": 3000,
"metrics": {"accuracy": 0.91, "f1_score": 0.88}
},
{
"slice": "contract_type = 'two-year'",
"sample_count": 1200,
"metrics": {"accuracy": 0.97, "f1_score": 0.72}
}
]
}Feature Sensitivity
Test how small perturbations to each feature affect predictions:
POST /api/v1/governance/debug/sensitivity{
"model_id": "model-xyz789",
"instance": {"tenure": 12, "monthly_charges": 65.00},
"perturbation_range": 0.1
}Configuration
| Environment Variable | Default | Description |
|---|---|---|
DEBUG_MAX_SAMPLES | 10000 | Maximum samples for analysis |
DEBUG_COUNTERFACTUAL_LIMIT | 5 | Max counterfactuals to generate |
DEBUG_CONFIDENCE_THRESHOLD | 0.6 | Low confidence boundary |
DEBUG_SLICE_MIN_SAMPLES | 30 | Minimum samples per slice |