MATIH Platform is in active MVP development. Documentation reflects current implementation status.
8. Platform Services
SLOs

SLOs

The SLOController manages Service Level Objectives that define reliability targets for platform services. SLOs track error budgets, availability targets, and latency targets, providing a structured approach to reliability engineering.


ServiceLevelObjective Structure

FieldTypeDescription
idStringSLO identifier
nameStringSLO name (e.g., "AI Service Availability")
descriptionStringDescription of the objective
serviceStringTarget service name
sliTypeStringavailability, latency, error_rate, throughput
targetdoubleTarget value (e.g., 99.9 for 99.9% availability)
windowStringEvaluation window (7d, 28d, 30d)
queryStringPromQL query for the SLI measurement
alertOnBudgetBurnbooleanAlert when error budget burn rate is high

SLO Management

Create SLO

Endpoint: POST /api/v1/observability/slos

curl -X POST http://localhost:8088/api/v1/observability/slos \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "X-Tenant-ID: 550e8400" \
  -d '{
    "name": "AI Service Availability",
    "service": "ai-service",
    "sliType": "availability",
    "target": 99.9,
    "window": "30d",
    "alertOnBudgetBurn": true
  }'

List SLOs

Endpoint: GET /api/v1/observability/slos

Get SLO

Endpoint: GET /api/v1/observability/slos/:sloId

Update SLO

Endpoint: PUT /api/v1/observability/slos/:sloId

Delete SLO

Endpoint: DELETE /api/v1/observability/slos/:sloId


SLO Status

Endpoint: GET /api/v1/observability/slos/:sloId/status

Returns the current status of an SLO including:

SLOStatus Structure

FieldTypeDescription
sloIdStringSLO identifier
currentValuedoubleCurrent SLI value
targetdoubleSLO target
errorBudgetTotaldoubleTotal error budget for the window
errorBudgetRemainingdoubleRemaining error budget
errorBudgetConsumeddoublePercentage of budget consumed
burnRatedoubleCurrent error budget burn rate
statusStringhealthy, warning, breaching
timeWindowStartInstantStart of evaluation window
timeWindowEndInstantEnd of evaluation window

Error Budget

The error budget represents the acceptable amount of unreliability within the SLO window:

Error Budget = 1 - SLO Target
Example: 99.9% target = 0.1% error budget = ~43 minutes per 30 days

Budget Burn Rate Alerts

When alertOnBudgetBurn is enabled, alerts fire based on how fast the error budget is being consumed:

Burn RateWindowSeverityDescription
14.4x1 hourCriticalBudget exhausted in 2 hours
6x6 hoursCriticalBudget exhausted in 5 hours
3x1 dayWarningBudget exhausted in 10 days
1x3 daysInfoBudget on track to be exhausted

SLO Dashboard

Endpoint: GET /api/v1/observability/slos/dashboard

Returns all SLOs with their current status for a consolidated SLO dashboard view.