Service Health

Every MATIH service exposes health endpoints for Kubernetes lifecycle management and monitoring. Python services use FastAPI health routes, and Java services use Spring Boot Actuator. Health responses include checks for the service itself and its critical dependencies.

Health Endpoints

Python Services (FastAPI)

Endpoint	Purpose	Response
`/health`	Basic liveness check	`{"status": "ok"}`
`/health/ready`	Readiness check with dependency validation	`{"status": "ok", "dependencies": {...}}`
`/metrics`	Prometheus metrics endpoint	Prometheus text format

Java Services (Spring Boot)

Endpoint	Purpose	Response
`/actuator/health`	Combined health check	`{"status": "UP", "components": {...}}`
`/actuator/health/liveness`	Liveness probe	`{"status": "UP"}`
`/actuator/health/readiness`	Readiness probe	`{"status": "UP"}`
`/actuator/prometheus`	Prometheus metrics	Prometheus text format

Kubernetes Probes

Liveness Probe

Determines if the container should be restarted:

livenessProbe:
  httpGet:
    path: /health
    port: http
  initialDelaySeconds: 30
  periodSeconds: 10
  failureThreshold: 3
  timeoutSeconds: 5

Readiness Probe

Determines if the container should receive traffic:

readinessProbe:
  httpGet:
    path: /health/ready
    port: http
  initialDelaySeconds: 10
  periodSeconds: 5
  failureThreshold: 3
  timeoutSeconds: 5

Startup Probe

For services with slow initialization:

startupProbe:
  httpGet:
    path: /health
    port: http
  initialDelaySeconds: 10
  periodSeconds: 5
  failureThreshold: 30
  timeoutSeconds: 5

Dependency Health Checks

The readiness endpoint checks critical dependencies:

Service	Dependencies Checked
AI Service	PostgreSQL, Redis, Kafka, Dgraph (optional), Pinecone (optional)
Query Engine	PostgreSQL, StarRocks
IAM Service	PostgreSQL, Redis
Tenant Service	PostgreSQL, Kafka
API Gateway	Redis, downstream services

Dependency Check Response

{
  "status": "ok",
  "dependencies": {
    "postgresql": {"status": "healthy", "latency_ms": 2.3},
    "redis": {"status": "healthy", "latency_ms": 0.8},
    "kafka": {"status": "healthy", "broker_count": 3},
    "dgraph": {"status": "degraded", "message": "Not reachable"}
  }
}

Services operate in degraded mode when optional dependencies (like Dgraph) are unavailable.

Health Check Best Practices

Liveness probes should only check the process itself, not dependencies
Readiness probes should check critical dependencies
Set appropriate timeouts to avoid false positives under load
Use startup probes for services with long initialization times
Never include expensive operations in health check endpoints

Health Check Overview Platform Health