Service Discovery
The ServiceDiscoveryController and ServiceDiscoveryService provide runtime service registration, instance discovery, health aggregation, and dependency graph management. Services register their instances at startup, and other services use the discovery API to locate healthy instances with weighted load balancing.
Register a Service Instance
Endpoint: POST /api/v1/services/register
curl -X POST http://localhost:8084/api/v1/services/register \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${TOKEN}" \
-d '{
"serviceName": "ai-service",
"instanceId": "ai-service-pod-abc123",
"host": "10.0.1.42",
"port": 8000,
"protocol": "http",
"version": "2.1.0",
"healthCheckUrl": "http://10.0.1.42:8000/health",
"metricsUrl": "http://10.0.1.42:8000/metrics",
"region": "us-east-1",
"zone": "us-east-1a",
"weight": 100,
"metadata": {
"gpu": "true",
"model-loaded": "gpt-4"
},
"tags": ["gpu", "production"]
}'If an instance with the same serviceName and instanceId already exists, it is updated rather than duplicated. New instances start in STARTING status with zero consecutive failures.
ServiceInstance Structure
| Field | Type | Description |
|---|---|---|
id | UUID | Internal database identifier |
serviceName | String | Logical service name |
instanceId | String | Unique instance identifier (e.g., pod name) |
host | String | Instance hostname or IP address |
port | Integer | Service port |
protocol | String | Protocol (http, https, grpc) |
version | String | Running software version |
status | ServiceStatus | Current instance status |
healthCheckUrl | String | URL for health probes |
metricsUrl | String | URL for Prometheus metrics |
metadata | Map | Key-value metadata pairs |
tags | Set | Tags for filtering |
lastHealthCheck | Instant | Last health check timestamp |
consecutiveFailures | Integer | Consecutive failed health checks |
region | String | Cloud region |
zone | String | Availability zone |
weight | Integer | Load balancing weight |
registeredAt | Instant | Registration timestamp |
lastUpdated | Instant | Last update timestamp |
deregisteredAt | Instant | Deregistration timestamp |
Service Status
| Status | Description |
|---|---|
STARTING | Instance registered, not yet healthy |
HEALTHY | Passing health checks |
UNHEALTHY | Failing health checks |
DRAINING | Draining connections before shutdown |
DOWN | Instance is down |
DEREGISTERED | Instance has been deregistered |
Deregister an Instance
Endpoint: DELETE /api/v1/services/:serviceName/instances/:instanceId
Marks the instance as DEREGISTERED and records the deregistration timestamp. Deregistered instances are excluded from discovery queries but retained in the database for audit purposes.
Service Discovery Endpoints
List All Services
Endpoint: GET /api/v1/services
Returns a list of all unique service names that have at least one active (non-deregistered) instance.
Get All Instances
Endpoint: GET /api/v1/services/:serviceName
Returns all instances for a service, including unhealthy and starting instances.
Get Healthy Instances
Endpoint: GET /api/v1/services/:serviceName/healthy
Returns only instances with HEALTHY status, suitable for routing traffic.
Get Single Instance (Load Balanced)
Endpoint: GET /api/v1/services/:serviceName/instance
Returns a single healthy instance selected via weighted random load balancing. Instances with higher weight values are more likely to be selected. If no healthy instances exist, returns 404.
Service Status Summary
Endpoint: GET /api/v1/services/status
Returns a status summary for every registered service.
{
"ai-service": {
"serviceName": "ai-service",
"totalInstances": 5,
"healthyInstances": 4,
"unhealthyInstances": 1,
"overallStatus": "UP"
},
"query-engine": {
"serviceName": "query-engine",
"totalInstances": 3,
"healthyInstances": 3,
"unhealthyInstances": 0,
"overallStatus": "UP"
}
}The overallStatus is UP if at least one healthy instance exists, DOWN otherwise.
Health Endpoints
Aggregated Service Health
Endpoint: GET /api/v1/services/:serviceName/health
Returns the aggregated health status for a service, computed by the HealthAggregationService. This includes health check results across all instances.
Platform-Wide Health
Endpoint: GET /api/v1/services/health
Returns the overall platform health status, aggregating health across all registered services.
Trigger Health Check
Endpoint: POST /api/v1/services/:serviceName/instances/:instanceId/health-check
Manually triggers a health check for a specific instance. Returns 202 Accepted as the health check runs asynchronously.
Service Dependencies
Add a Dependency
Endpoint: POST /api/v1/services/:serviceName/dependencies
curl -X POST http://localhost:8084/api/v1/services/ai-service/dependencies \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${TOKEN}" \
-d '{
"dependsOn": "query-engine",
"type": "RUNTIME",
"required": true,
"minVersion": "2.0.0",
"healthCheckEndpoint": "/health",
"timeoutMs": 5000
}'Duplicate dependencies (same serviceName and dependsOn) are rejected with an error.
ServiceDependency Structure
| Field | Type | Description |
|---|---|---|
id | UUID | Dependency identifier |
serviceName | String | Service that has the dependency |
dependsOn | String | Service being depended on |
type | DependencyType | Classification of the dependency |
minVersion | String | Minimum required version |
maxVersion | String | Maximum supported version |
required | Boolean | Whether the dependency is required |
healthCheckEndpoint | String | Health check endpoint on the dependency |
timeoutMs | Integer | Health check timeout in milliseconds |
Dependency Types
| Type | Description |
|---|---|
RUNTIME | Required at runtime for normal operation |
BUILD | Required at build time only |
OPTIONAL | Optional dependency for enhanced features |
DEV | Development-only dependency |
Dependency Queries
Get Dependencies
Endpoint: GET /api/v1/services/:serviceName/dependencies
Returns all declared dependencies for a service.
Get Dependents
Endpoint: GET /api/v1/services/:serviceName/dependents
Returns a list of service names that depend on the specified service. Useful for impact analysis before taking a service offline.
Check Dependency Health
Endpoint: GET /api/v1/services/:serviceName/dependencies/health
Checks whether all required dependencies have at least one healthy instance.
{
"serviceName": "ai-service",
"allHealthy": true,
"dependencies": [
{
"serviceName": "query-engine",
"healthy": true,
"healthyInstances": 3
},
{
"serviceName": "redis",
"healthy": true,
"healthyInstances": 1
}
]
}Full Dependency Graph
Endpoint: GET /api/v1/services/dependencies/graph
Returns the complete dependency graph across all services as a set of nodes (service names) and directed edges (dependency relationships).
{
"nodes": ["ai-service", "query-engine", "redis", "kafka", "postgresql"],
"edges": [
{ "from": "ai-service", "to": "query-engine", "type": "RUNTIME", "required": true },
{ "from": "ai-service", "to": "redis", "type": "RUNTIME", "required": true },
{ "from": "query-engine", "to": "postgresql", "type": "RUNTIME", "required": true },
{ "from": "query-engine", "to": "kafka", "type": "OPTIONAL", "required": false }
]
}