Drift Detection

The DriftDetectionService continuously monitors tenant infrastructure for configuration drift. Drift occurs when the actual infrastructure state deviates from the desired state defined in the platform, whether due to manual changes, external tools, or infrastructure failures.

How Drift Detection Works

Desired state is stored in the DesiredInfrastructureState table
Actual state is read from the cloud provider APIs and Kubernetes cluster
Comparison identifies any differences between desired and actual
Report generates a drift report with details of each discrepancy
Remediation optionally auto-remediates or creates alerts for manual review

Drift Detection Endpoint

Endpoint: POST /api/v1/infrastructure/tenants/:tenantId/drift-check

Triggers an on-demand drift detection scan for a tenant.

curl -X POST http://localhost:8089/api/v1/infrastructure/tenants/550e8400/drift-check \
  -H "Authorization: Bearer ${TOKEN}"

Drift Report

Endpoint: GET /api/v1/infrastructure/tenants/:tenantId/drift-report

Returns the most recent drift detection results.

Report Structure

{
  "tenantId": "550e8400-e29b-41d4-a716-446655440000",
  "scanTimestamp": "2026-02-12T10:30:00Z",
  "hasDrift": true,
  "driftItems": [
    {
      "resourceType": "deployment",
      "resourceName": "ai-service",
      "field": "replicas",
      "desiredValue": "3",
      "actualValue": "2",
      "severity": "HIGH"
    },
    {
      "resourceType": "configmap",
      "resourceName": "ai-service-config",
      "field": "query.timeout",
      "desiredValue": "60s",
      "actualValue": "30s",
      "severity": "MEDIUM"
    }
  ],
  "summary": {
    "totalResources": 45,
    "driftedResources": 2,
    "healthyResources": 43
  }
}

Monitored Resources

Resource Type	What Is Checked
Kubernetes Deployments	Replicas, image versions, resource limits, environment variables
Kubernetes Services	Ports, selectors, type
ConfigMaps	Configuration values
Secrets	Existence (not values)
Ingress	Rules, TLS configuration, annotations
Database	Size, version, replication settings
Storage	Capacity, performance tier
Network Policies	Ingress/egress rules

Scheduled Detection

Drift detection runs on a configurable schedule (default: every 30 minutes). The reconciler compares the DesiredInfrastructureState entries with live cluster state and publishes drift events.

Auto-Remediation

When drift is detected, the system can:

Alert only -- Send notifications to administrators (default)
Auto-remediate -- Automatically apply the desired state to correct drift
Queue for review -- Create a remediation ticket for manual approval

Auto-remediation is configured per-tenant and per-resource-type to balance automation with change control.

⚠️

Auto-remediation should be carefully configured. Some drift (e.g., manual scaling during an incident) may be intentional. Review drift reports before enabling auto-remediation for production tenants.

Provisioning Credentials