Dependency Checks
Dependency checks verify the connectivity and health of external dependencies that MATIH services rely on. These checks run as part of readiness probes and the platform health check script, ensuring that services only receive traffic when their dependencies are available.
Dependencies by Service
| Service | Dependency | Type | Critical |
|---|---|---|---|
| AI Service | PostgreSQL | Database | Yes |
| AI Service | Redis | Cache | Yes |
| AI Service | Kafka | Message queue | Yes |
| AI Service | Dgraph | Graph database | No (degraded mode) |
| AI Service | Pinecone | Vector store | No (mock mode) |
| AI Service | OpenAI API | LLM provider | No (cached responses) |
| Query Engine | PostgreSQL | Database | Yes |
| Query Engine | StarRocks | OLAP engine | Yes |
| IAM Service | PostgreSQL | Database | Yes |
| IAM Service | Redis | Session store | Yes |
| Tenant Service | PostgreSQL | Database | Yes |
| Tenant Service | Kafka | Event bus | Yes |
| API Gateway | Redis | Rate limiting | Yes |
Check Types
Database Check
Executes a minimal query to verify connectivity:
async def check_postgresql(pool) -> dict:
try:
async with pool.acquire() as conn:
result = await conn.fetchval("SELECT 1")
return {"status": "healthy", "latency_ms": measured_latency}
except Exception as e:
return {"status": "unhealthy", "error": str(e)}Cache Check
Pings Redis to verify connectivity:
async def check_redis(client) -> dict:
try:
await client.ping()
return {"status": "healthy"}
except Exception as e:
return {"status": "unhealthy", "error": str(e)}Message Queue Check
Verifies Kafka broker availability:
async def check_kafka(config) -> dict:
try:
producer = AIOKafkaProducer(bootstrap_servers=config.bootstrap_servers)
await producer.start()
await producer.stop()
return {"status": "healthy", "broker_count": len(config.bootstrap_servers)}
except Exception as e:
return {"status": "unhealthy", "error": str(e)}Critical vs. Non-Critical Dependencies
| Classification | Behavior on Failure | Example |
|---|---|---|
| Critical | Service reports not ready, stops receiving traffic | PostgreSQL, Redis |
| Non-critical | Service reports degraded, continues serving with reduced functionality | Dgraph, Pinecone |
Dependency Health Response Format
{
"status": "degraded",
"dependencies": {
"postgresql": {
"status": "healthy",
"latency_ms": 2.1,
"pool_size": 10,
"pool_available": 8
},
"redis": {
"status": "healthy",
"latency_ms": 0.5
},
"kafka": {
"status": "healthy",
"broker_count": 3
},
"dgraph": {
"status": "unhealthy",
"error": "Connection refused"
},
"pinecone": {
"status": "degraded",
"mode": "mock",
"reason": "API key not configured"
}
}
}Timeout Configuration
| Dependency | Check Timeout | Description |
|---|---|---|
| PostgreSQL | 5 seconds | Database query timeout |
| Redis | 2 seconds | Ping timeout |
| Kafka | 10 seconds | Broker connection timeout |
| Dgraph | 5 seconds | GraphQL health endpoint |
| Pinecone | 5 seconds | API health check |
| External APIs | 10 seconds | HTTP request timeout |
Monitoring Dependency Health
Dependency health is tracked via Prometheus metrics:
| Metric | Type | Labels |
|---|---|---|
matih_dependency_up | Gauge | service, dependency |
matih_dependency_latency_seconds | Histogram | service, dependency |
matih_dependency_errors_total | Counter | service, dependency, error_type |