MATIH Platform is in active MVP development. Documentation reflects current implementation status.
19. Observability & Operations
Health Checks
Service Health

Service Health

Every MATIH service exposes health endpoints for Kubernetes lifecycle management and monitoring. Python services use FastAPI health routes, and Java services use Spring Boot Actuator. Health responses include checks for the service itself and its critical dependencies.


Health Endpoints

Python Services (FastAPI)

EndpointPurposeResponse
/healthBasic liveness check{"status": "ok"}
/health/readyReadiness check with dependency validation{"status": "ok", "dependencies": {...}}
/metricsPrometheus metrics endpointPrometheus text format

Java Services (Spring Boot)

EndpointPurposeResponse
/actuator/healthCombined health check{"status": "UP", "components": {...}}
/actuator/health/livenessLiveness probe{"status": "UP"}
/actuator/health/readinessReadiness probe{"status": "UP"}
/actuator/prometheusPrometheus metricsPrometheus text format

Kubernetes Probes

Liveness Probe

Determines if the container should be restarted:

livenessProbe:
  httpGet:
    path: /health
    port: http
  initialDelaySeconds: 30
  periodSeconds: 10
  failureThreshold: 3
  timeoutSeconds: 5

Readiness Probe

Determines if the container should receive traffic:

readinessProbe:
  httpGet:
    path: /health/ready
    port: http
  initialDelaySeconds: 10
  periodSeconds: 5
  failureThreshold: 3
  timeoutSeconds: 5

Startup Probe

For services with slow initialization:

startupProbe:
  httpGet:
    path: /health
    port: http
  initialDelaySeconds: 10
  periodSeconds: 5
  failureThreshold: 30
  timeoutSeconds: 5

Dependency Health Checks

The readiness endpoint checks critical dependencies:

ServiceDependencies Checked
AI ServicePostgreSQL, Redis, Kafka, Dgraph (optional), Pinecone (optional)
Query EnginePostgreSQL, StarRocks
IAM ServicePostgreSQL, Redis
Tenant ServicePostgreSQL, Kafka
API GatewayRedis, downstream services

Dependency Check Response

{
  "status": "ok",
  "dependencies": {
    "postgresql": {"status": "healthy", "latency_ms": 2.3},
    "redis": {"status": "healthy", "latency_ms": 0.8},
    "kafka": {"status": "healthy", "broker_count": 3},
    "dgraph": {"status": "degraded", "message": "Not reachable"}
  }
}

Services operate in degraded mode when optional dependencies (like Dgraph) are unavailable.


Health Check Best Practices

  • Liveness probes should only check the process itself, not dependencies
  • Readiness probes should check critical dependencies
  • Set appropriate timeouts to avoid false positives under load
  • Use startup probes for services with long initialization times
  • Never include expensive operations in health check endpoints