MATIH Platform is in active MVP development. Documentation reflects current implementation status.
8. Platform Services
Service Discovery

Service Discovery

The ServiceDiscoveryController and ServiceDiscoveryService provide runtime service registration, instance discovery, health aggregation, and dependency graph management. Services register their instances at startup, and other services use the discovery API to locate healthy instances with weighted load balancing.


Register a Service Instance

Endpoint: POST /api/v1/services/register

curl -X POST http://localhost:8084/api/v1/services/register \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${TOKEN}" \
  -d '{
    "serviceName": "ai-service",
    "instanceId": "ai-service-pod-abc123",
    "host": "10.0.1.42",
    "port": 8000,
    "protocol": "http",
    "version": "2.1.0",
    "healthCheckUrl": "http://10.0.1.42:8000/health",
    "metricsUrl": "http://10.0.1.42:8000/metrics",
    "region": "us-east-1",
    "zone": "us-east-1a",
    "weight": 100,
    "metadata": {
      "gpu": "true",
      "model-loaded": "gpt-4"
    },
    "tags": ["gpu", "production"]
  }'

If an instance with the same serviceName and instanceId already exists, it is updated rather than duplicated. New instances start in STARTING status with zero consecutive failures.


ServiceInstance Structure

FieldTypeDescription
idUUIDInternal database identifier
serviceNameStringLogical service name
instanceIdStringUnique instance identifier (e.g., pod name)
hostStringInstance hostname or IP address
portIntegerService port
protocolStringProtocol (http, https, grpc)
versionStringRunning software version
statusServiceStatusCurrent instance status
healthCheckUrlStringURL for health probes
metricsUrlStringURL for Prometheus metrics
metadataMapKey-value metadata pairs
tagsSetTags for filtering
lastHealthCheckInstantLast health check timestamp
consecutiveFailuresIntegerConsecutive failed health checks
regionStringCloud region
zoneStringAvailability zone
weightIntegerLoad balancing weight
registeredAtInstantRegistration timestamp
lastUpdatedInstantLast update timestamp
deregisteredAtInstantDeregistration timestamp

Service Status

StatusDescription
STARTINGInstance registered, not yet healthy
HEALTHYPassing health checks
UNHEALTHYFailing health checks
DRAININGDraining connections before shutdown
DOWNInstance is down
DEREGISTEREDInstance has been deregistered

Deregister an Instance

Endpoint: DELETE /api/v1/services/:serviceName/instances/:instanceId

Marks the instance as DEREGISTERED and records the deregistration timestamp. Deregistered instances are excluded from discovery queries but retained in the database for audit purposes.


Service Discovery Endpoints

List All Services

Endpoint: GET /api/v1/services

Returns a list of all unique service names that have at least one active (non-deregistered) instance.

Get All Instances

Endpoint: GET /api/v1/services/:serviceName

Returns all instances for a service, including unhealthy and starting instances.

Get Healthy Instances

Endpoint: GET /api/v1/services/:serviceName/healthy

Returns only instances with HEALTHY status, suitable for routing traffic.

Get Single Instance (Load Balanced)

Endpoint: GET /api/v1/services/:serviceName/instance

Returns a single healthy instance selected via weighted random load balancing. Instances with higher weight values are more likely to be selected. If no healthy instances exist, returns 404.

Service Status Summary

Endpoint: GET /api/v1/services/status

Returns a status summary for every registered service.

{
  "ai-service": {
    "serviceName": "ai-service",
    "totalInstances": 5,
    "healthyInstances": 4,
    "unhealthyInstances": 1,
    "overallStatus": "UP"
  },
  "query-engine": {
    "serviceName": "query-engine",
    "totalInstances": 3,
    "healthyInstances": 3,
    "unhealthyInstances": 0,
    "overallStatus": "UP"
  }
}

The overallStatus is UP if at least one healthy instance exists, DOWN otherwise.


Health Endpoints

Aggregated Service Health

Endpoint: GET /api/v1/services/:serviceName/health

Returns the aggregated health status for a service, computed by the HealthAggregationService. This includes health check results across all instances.

Platform-Wide Health

Endpoint: GET /api/v1/services/health

Returns the overall platform health status, aggregating health across all registered services.

Trigger Health Check

Endpoint: POST /api/v1/services/:serviceName/instances/:instanceId/health-check

Manually triggers a health check for a specific instance. Returns 202 Accepted as the health check runs asynchronously.


Service Dependencies

Add a Dependency

Endpoint: POST /api/v1/services/:serviceName/dependencies

curl -X POST http://localhost:8084/api/v1/services/ai-service/dependencies \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${TOKEN}" \
  -d '{
    "dependsOn": "query-engine",
    "type": "RUNTIME",
    "required": true,
    "minVersion": "2.0.0",
    "healthCheckEndpoint": "/health",
    "timeoutMs": 5000
  }'

Duplicate dependencies (same serviceName and dependsOn) are rejected with an error.

ServiceDependency Structure

FieldTypeDescription
idUUIDDependency identifier
serviceNameStringService that has the dependency
dependsOnStringService being depended on
typeDependencyTypeClassification of the dependency
minVersionStringMinimum required version
maxVersionStringMaximum supported version
requiredBooleanWhether the dependency is required
healthCheckEndpointStringHealth check endpoint on the dependency
timeoutMsIntegerHealth check timeout in milliseconds

Dependency Types

TypeDescription
RUNTIMERequired at runtime for normal operation
BUILDRequired at build time only
OPTIONALOptional dependency for enhanced features
DEVDevelopment-only dependency

Dependency Queries

Get Dependencies

Endpoint: GET /api/v1/services/:serviceName/dependencies

Returns all declared dependencies for a service.

Get Dependents

Endpoint: GET /api/v1/services/:serviceName/dependents

Returns a list of service names that depend on the specified service. Useful for impact analysis before taking a service offline.

Check Dependency Health

Endpoint: GET /api/v1/services/:serviceName/dependencies/health

Checks whether all required dependencies have at least one healthy instance.

{
  "serviceName": "ai-service",
  "allHealthy": true,
  "dependencies": [
    {
      "serviceName": "query-engine",
      "healthy": true,
      "healthyInstances": 3
    },
    {
      "serviceName": "redis",
      "healthy": true,
      "healthyInstances": 1
    }
  ]
}

Full Dependency Graph

Endpoint: GET /api/v1/services/dependencies/graph

Returns the complete dependency graph across all services as a set of nodes (service names) and directed edges (dependency relationships).

{
  "nodes": ["ai-service", "query-engine", "redis", "kafka", "postgresql"],
  "edges": [
    { "from": "ai-service", "to": "query-engine", "type": "RUNTIME", "required": true },
    { "from": "ai-service", "to": "redis", "type": "RUNTIME", "required": true },
    { "from": "query-engine", "to": "postgresql", "type": "RUNTIME", "required": true },
    { "from": "query-engine", "to": "kafka", "type": "OPTIONAL", "required": false }
  ]
}