Platform Registry
The Platform Registry provides centralized management of all platform components, their versions, upgrade paths, and service discovery. It tracks the lifecycle of every component in the MATIH platform -- from Trino and Spark to Kafka and MLflow -- enabling controlled upgrades, dependency resolution, and service health aggregation across multi-tenant deployments.
Architecture
The Platform Registry runs as a Spring Boot 3.2 service on port 8084 with two primary controllers:
- RegistryController -- Manages component registration, version lifecycle, upgrade path calculation, and upgrade execution orchestration
- ServiceDiscoveryController -- Handles runtime service instance registration, discovery, health aggregation, and dependency graph management
The service persists data in PostgreSQL and publishes events to Kafka topics (registry.version-events and registry.upgrade-orchestration-events) for downstream consumers.
Core Concepts
| Concept | Description |
|---|---|
| PlatformComponent | A registered platform technology (e.g., Trino, Kafka, MLflow) with category and status |
| ComponentVersion | A specific version of a component with semantic versioning, LTS/recommended flags, and dependency tracking |
| UpgradePath | A defined route from one version to another with risk assessment, duration estimates, and rollback procedures |
| UpgradeExecution | A tracked execution of an upgrade for a specific tenant, supporting rolling, canary, blue-green, and staged strategies |
| ServiceInstance | A runtime instance of a service registered for discovery with health monitoring |
| ServiceDependency | A declared dependency between services with type classification and health check configuration |
Component Categories
Components are organized into categories that reflect platform architecture layers:
| Category | Examples | Description |
|---|---|---|
QUERY_ENGINE | Trino, Spark SQL | SQL query execution engines |
STREAMING | Kafka, Flink | Real-time data streaming |
STORAGE | MinIO, Delta Lake, Iceberg | Data storage and table formats |
COMPUTE | Spark, Ray | Distributed compute engines |
ML | MLflow, Ray Serve | Machine learning platforms |
ORCHESTRATION | Temporal, Airflow | Workflow orchestration |
CATALOG | OpenMetadata, Polaris | Data catalog and metadata |
OBSERVABILITY | Prometheus, Grafana | Monitoring and observability |
SECURITY | Vault, Keycloak | Security and identity |
INFRASTRUCTURE | PostgreSQL, Redis | Core infrastructure services |
API Structure
The registry exposes two base paths:
| Base Path | Controller | Purpose |
|---|---|---|
/api/v1/registry | RegistryController | Component and version management, upgrade paths, upgrade execution |
/api/v1/services | ServiceDiscoveryController | Service instance registration, discovery, health, dependencies |
Authentication
All endpoints require JWT authentication via the Authorization: Bearer header. Component management operations (registration, version updates, upgrade initiation) require platform administrator or tenant administrator roles.
Event-Driven Integration
The registry publishes events to Kafka for integration with other platform services:
| Topic | Events |
|---|---|
registry.version-events | VERSION_REGISTERED, VERSION_STATUS_CHANGED, VERSION_DEPRECATED, VERSION_END_OF_LIFE, VERSION_END_OF_SUPPORT, RECOMMENDED_VERSION_CHANGED |
registry.upgrade-orchestration-events | UPGRADE_INITIATED, UPGRADE_STARTED, UPGRADE_VALIDATION_FAILED, CANARY_DEPLOYMENT_STARTED, CANARY_PROMOTED, BATCH_COMPLETED, UPGRADE_COMPLETED, UPGRADE_FAILED, UPGRADE_PAUSED, UPGRADE_RESUMED, UPGRADE_CANCELLED, ROLLBACK_INITIATED, ROLLBACK_COMPLETED, ROLLBACK_FAILED, UPGRADE_TIMEOUT |
Caching
The registry uses Spring Cache for frequently accessed data:
| Cache Name | Key | Content |
|---|---|---|
componentVersions | componentId | All versions for a component |
recommendedVersions | componentId | The recommended version for a component |
upgradePaths | fromVersionId-toVersionId | Calculated upgrade routes between versions |
Cache eviction occurs automatically when versions are registered, statuses are updated, or upgrade paths are modified.
Scheduled Tasks
| Task | Schedule | Description |
|---|---|---|
| Version lifecycle processing | Daily at 6:00 AM | Marks versions reaching end-of-life, publishes end-of-support notifications |
| Canary monitoring | Every 30 seconds | Checks health of active canary deployments, triggers auto-rollback on failure |
| Stale execution cleanup | Every 5 minutes | Marks upgrade executions older than 24 hours as failed |