Drift Detection
The DriftDetectionService continuously monitors tenant infrastructure for configuration drift. Drift occurs when the actual infrastructure state deviates from the desired state defined in the platform, whether due to manual changes, external tools, or infrastructure failures.
How Drift Detection Works
- Desired state is stored in the
DesiredInfrastructureStatetable - Actual state is read from the cloud provider APIs and Kubernetes cluster
- Comparison identifies any differences between desired and actual
- Report generates a drift report with details of each discrepancy
- Remediation optionally auto-remediates or creates alerts for manual review
Drift Detection Endpoint
Endpoint: POST /api/v1/infrastructure/tenants/:tenantId/drift-check
Triggers an on-demand drift detection scan for a tenant.
curl -X POST http://localhost:8089/api/v1/infrastructure/tenants/550e8400/drift-check \
-H "Authorization: Bearer ${TOKEN}"Drift Report
Endpoint: GET /api/v1/infrastructure/tenants/:tenantId/drift-report
Returns the most recent drift detection results.
Report Structure
{
"tenantId": "550e8400-e29b-41d4-a716-446655440000",
"scanTimestamp": "2026-02-12T10:30:00Z",
"hasDrift": true,
"driftItems": [
{
"resourceType": "deployment",
"resourceName": "ai-service",
"field": "replicas",
"desiredValue": "3",
"actualValue": "2",
"severity": "HIGH"
},
{
"resourceType": "configmap",
"resourceName": "ai-service-config",
"field": "query.timeout",
"desiredValue": "60s",
"actualValue": "30s",
"severity": "MEDIUM"
}
],
"summary": {
"totalResources": 45,
"driftedResources": 2,
"healthyResources": 43
}
}Monitored Resources
| Resource Type | What Is Checked |
|---|---|
| Kubernetes Deployments | Replicas, image versions, resource limits, environment variables |
| Kubernetes Services | Ports, selectors, type |
| ConfigMaps | Configuration values |
| Secrets | Existence (not values) |
| Ingress | Rules, TLS configuration, annotations |
| Database | Size, version, replication settings |
| Storage | Capacity, performance tier |
| Network Policies | Ingress/egress rules |
Scheduled Detection
Drift detection runs on a configurable schedule (default: every 30 minutes). The reconciler compares the DesiredInfrastructureState entries with live cluster state and publishes drift events.
Auto-Remediation
When drift is detected, the system can:
- Alert only -- Send notifications to administrators (default)
- Auto-remediate -- Automatically apply the desired state to correct drift
- Queue for review -- Create a remediation ticket for manual approval
Auto-remediation is configured per-tenant and per-resource-type to balance automation with change control.
Auto-remediation should be carefully configured. Some drift (e.g., manual scaling during an incident) may be intentional. Review drift reports before enabling auto-remediation for production tenants.