Upgrade Execution
The UpgradeOrchestrator manages the end-to-end execution of component upgrades across tenant deployments. It supports multiple deployment strategies (rolling, canary, blue-green, recreate, staged), pre- and post-upgrade validation checks, batch processing with configurable delays, automatic rollback on failure, and real-time progress tracking via Kafka events.
Initiate an Upgrade
Endpoint: POST /api/v1/registry/tenants/:tenantId/upgrades
curl -X POST http://localhost:8084/api/v1/registry/tenants/550e8400/upgrades \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${TOKEN}" \
-d '{
"componentId": "aaa-111",
"fromVersionId": "bbb-222",
"toVersionId": "ccc-333",
"strategy": "CANARY",
"canaryPercentage": 10,
"batchSize": 5,
"batchDelaySeconds": 60,
"healthCheckIntervalSeconds": 30,
"healthCheckTimeoutSeconds": 300,
"maxUnavailablePercentage": 25,
"totalInstances": 20,
"autoRollbackEnabled": true,
"failureThreshold": 3,
"initiatedBy": "admin@matih.ai",
"installedComponents": {
"kafka": "3.6.0",
"postgresql": "16.1"
}
}'Before creating the execution, the orchestrator validates that no active upgrade exists for the same component and tenant, both versions belong to the same component, and the upgrade path validation passes (no blockers).
UpgradeRequest Structure
| Field | Type | Default | Description |
|---|---|---|---|
componentId | UUID | required | Target component |
fromVersionId | UUID | required | Current version |
toVersionId | UUID | required | Target version |
strategy | UpgradeStrategy | ROLLING | Deployment strategy |
canaryPercentage | Integer | 10 | Percentage of instances for canary |
batchSize | Integer | 5 | Instances per batch |
batchDelaySeconds | Integer | 60 | Delay between batches in seconds |
healthCheckIntervalSeconds | Integer | 30 | Health check polling interval |
healthCheckTimeoutSeconds | Integer | 300 | Health check timeout |
maxUnavailablePercentage | Integer | 25 | Maximum unavailable instances |
totalInstances | Integer | 10 | Total instances to upgrade |
autoRollbackEnabled | boolean | false | Enable automatic rollback on failure |
failureThreshold | Integer | 3 | Number of failures before auto-rollback |
initiatedBy | String | -- | User who initiated the upgrade |
installedComponents | Map | -- | Currently installed component versions |
Upgrade Strategies
| Strategy | Description | Use Case |
|---|---|---|
ROLLING | Gradual replacement in batches | Standard production upgrades |
CANARY | Small percentage first, then full rollout after monitoring | Risk-sensitive upgrades |
BLUE_GREEN | Full parallel deployment with traffic switch | Zero-downtime critical upgrades |
RECREATE | All instances replaced at once | Development and testing environments |
STAGED | Manual approval required between stages | Regulated environments |
Execution Status Lifecycle
| Status | Description | Valid Transitions |
|---|---|---|
PENDING | Created but not started | VALIDATING, CANCELLED |
VALIDATING | Running pre-upgrade checks | IN_PROGRESS, CANARY_RUNNING, FAILED |
IN_PROGRESS | Actively upgrading instances in batches | PAUSED, COMPLETED, FAILED, ROLLING_BACK, CANCELLED |
CANARY_RUNNING | Canary instances being deployed | CANARY_MONITORING, PAUSED, FAILED, CANCELLED |
CANARY_MONITORING | Monitoring canary health metrics | IN_PROGRESS (via promote), ROLLING_BACK, CANCELLED |
PAUSED | Manually paused by operator | IN_PROGRESS (via resume), CANCELLED |
ROLLING_BACK | Rollback in progress | ROLLED_BACK, FAILED |
ROLLED_BACK | Successfully rolled back | Terminal state |
COMPLETED | All instances upgraded successfully | Terminal state |
FAILED | Upgrade failed with errors | Terminal state |
CANCELLED | Manually cancelled by operator | Terminal state |
Start an Upgrade
Endpoint: POST /api/v1/registry/upgrades/:executionId/start
Transitions the execution from PENDING to VALIDATING and begins asynchronous pre-upgrade checks. The five validation checks are:
- Version compatibility -- Confirms the upgrade path is valid
- Resource availability -- Verifies sufficient resources exist for the upgrade
- Backup verification -- Confirms a backup exists for rollback
- Health endpoints -- Validates health check endpoints are accessible
- Dependency check -- Verifies dependency versions are compatible
If all checks pass and the strategy is CANARY, a canary deployment begins. Otherwise, a rolling upgrade starts immediately.
Instance Upgrade Lifecycle
Each instance goes through its own lifecycle during the upgrade:
| Status | Description |
|---|---|
PENDING | Waiting to be upgraded |
DRAINING | Draining active connections |
UPGRADING | Applying the version update |
HEALTH_CHECKING | Running post-upgrade health validation |
COMPLETED | Successfully upgraded |
FAILED | Upgrade failed for this instance |
ROLLED_BACK | Instance reverted to previous version |
Canary Workflow
For canary deployments, the orchestrator follows this sequence:
- Calculate canary instance count from
canaryPercentage(e.g., 10% of 20 = 2 instances) - Deploy the new version to canary instances
- Transition to
CANARY_MONITORINGstatus - A scheduled monitor checks canary health every 30 seconds
- If health checks fail and auto-rollback is enabled, rollback is triggered automatically
- On manual promotion (
POST /api/v1/registry/upgrades/:executionId/promote-canary), the remaining instances are upgraded via rolling strategy
Pause, Resume, and Cancel
| Operation | Endpoint | Valid From |
|---|---|---|
| Pause | POST /api/v1/registry/upgrades/:executionId/pause | IN_PROGRESS, CANARY_RUNNING |
| Resume | POST /api/v1/registry/upgrades/:executionId/resume | PAUSED |
| Cancel | POST /api/v1/registry/upgrades/:executionId/cancel | Any non-terminal status |
Pausing an upgrade stops batch processing at the current position. Resuming continues from where the upgrade left off.
Rollback
Endpoint: POST /api/v1/registry/upgrades/:executionId/rollback?reason=Manual+rollback
Initiates a rollback that reverts all upgraded and health-checking instances to the previous version. The rollback runs asynchronously and updates the RollbackInfo structure with details.
RollbackInfo Structure
| Field | Type | Description |
|---|---|---|
autoRollbackEnabled | boolean | Whether auto-rollback is configured |
failureThreshold | int | Number of failures that trigger auto-rollback |
currentFailures | int | Current failure count |
rollbackReason | String | Reason for the rollback |
rollbackInitiatedAt | LocalDateTime | When the rollback started |
rollbackInitiatedBy | String | Who or what triggered the rollback |
rolledBackInstances | List | Instance IDs that were rolled back |
Auto-rollback triggers when currentFailures reaches or exceeds failureThreshold and autoRollbackEnabled is true.
Upgrade Statistics
Endpoint: GET /api/v1/registry/components/:componentId/upgrade-statistics
ExecutionStatistics Structure
| Field | Type | Description |
|---|---|---|
componentId | UUID | Component identifier |
totalExecutions | long | Total upgrade executions |
successfulExecutions | long | Completed successfully |
failedExecutions | long | Failed or rolled back |
successRate | double | Success percentage |
averageDurationSeconds | double | Average upgrade duration |
Query Executions
| Method | Path | Description |
|---|---|---|
GET | /api/v1/registry/upgrades/:executionId | Get execution by ID |
GET | /api/v1/registry/tenants/:tenantId/upgrades | List all executions for a tenant |
GET | /api/v1/registry/tenants/:tenantId/components/:componentId/upgrades/active | Get active execution |