Service Health
Every MATIH service exposes health endpoints for Kubernetes lifecycle management and monitoring. Python services use FastAPI health routes, and Java services use Spring Boot Actuator. Health responses include checks for the service itself and its critical dependencies.
Health Endpoints
Python Services (FastAPI)
| Endpoint | Purpose | Response |
|---|---|---|
/health | Basic liveness check | {"status": "ok"} |
/health/ready | Readiness check with dependency validation | {"status": "ok", "dependencies": {...}} |
/metrics | Prometheus metrics endpoint | Prometheus text format |
Java Services (Spring Boot)
| Endpoint | Purpose | Response |
|---|---|---|
/actuator/health | Combined health check | {"status": "UP", "components": {...}} |
/actuator/health/liveness | Liveness probe | {"status": "UP"} |
/actuator/health/readiness | Readiness probe | {"status": "UP"} |
/actuator/prometheus | Prometheus metrics | Prometheus text format |
Kubernetes Probes
Liveness Probe
Determines if the container should be restarted:
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 30
periodSeconds: 10
failureThreshold: 3
timeoutSeconds: 5Readiness Probe
Determines if the container should receive traffic:
readinessProbe:
httpGet:
path: /health/ready
port: http
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 3
timeoutSeconds: 5Startup Probe
For services with slow initialization:
startupProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 30
timeoutSeconds: 5Dependency Health Checks
The readiness endpoint checks critical dependencies:
| Service | Dependencies Checked |
|---|---|
| AI Service | PostgreSQL, Redis, Kafka, Dgraph (optional), Pinecone (optional) |
| Query Engine | PostgreSQL, StarRocks |
| IAM Service | PostgreSQL, Redis |
| Tenant Service | PostgreSQL, Kafka |
| API Gateway | Redis, downstream services |
Dependency Check Response
{
"status": "ok",
"dependencies": {
"postgresql": {"status": "healthy", "latency_ms": 2.3},
"redis": {"status": "healthy", "latency_ms": 0.8},
"kafka": {"status": "healthy", "broker_count": 3},
"dgraph": {"status": "degraded", "message": "Not reachable"}
}
}Services operate in degraded mode when optional dependencies (like Dgraph) are unavailable.
Health Check Best Practices
- Liveness probes should only check the process itself, not dependencies
- Readiness probes should check critical dependencies
- Set appropriate timeouts to avoid false positives under load
- Use startup probes for services with long initialization times
- Never include expensive operations in health check endpoints