MATIH Platform is in active MVP development. Documentation reflects current implementation status.
19. Observability & Operations
Disaster Recovery
Redis Backup

Redis Backup

Redis is used for caching, session storage, and rate limiting in the MATIH platform. While Redis data is generally ephemeral and can be rebuilt, session data and rate limit state benefit from periodic backups to minimize disruption during recovery.


Backup Strategy

MethodFrequencyRPOUse Case
RDB SnapshotsEvery 6 hours6 hoursFull point-in-time snapshot
AOF PersistenceContinuousSecondsAppend-only file for durability

RDB Snapshots

Redis saves RDB snapshots to disk based on configured save points:

save 3600 1     # Save if at least 1 key changed in 3600 seconds
save 300 100    # Save if at least 100 keys changed in 300 seconds
save 60 10000   # Save if at least 10000 keys changed in 60 seconds

Backup to Object Storage

RDB files are periodically copied to object storage by the backup automation.


AOF Persistence

For deployments requiring lower RPO, AOF (Append Only File) persistence can be enabled:

appendonly yes
appendfsync everysec

This provides near-zero data loss at the cost of higher disk I/O.


Restore Procedures

From RDB Snapshot

  1. Stop the Redis instance
  2. Replace the dump.rdb file with the backup
  3. Start Redis -- it automatically loads the RDB on startup
  4. Verify key counts and application connectivity

From AOF

  1. Stop the Redis instance
  2. Replace the appendonly.aof file with the backup
  3. Start Redis with AOF replay
  4. Verify data integrity

What Is Stored in Redis

DataTTLImpact of Loss
Session tokens24 hoursUsers must re-authenticate
API response cache5-60 minutesTemporary performance degradation
Rate limit counters1-60 minutesRate limits temporarily reset
Feature flag cache60 secondsBrief re-computation of flag values
Permission cache300 secondsBrief re-evaluation of permissions

Recovery Priority

Redis data loss is generally low-impact since all data has a TTL and can be regenerated. The priority is to restore Redis availability rather than data:

  1. Restart the Redis pod
  2. Verify connectivity from application services
  3. Monitor cache hit rates to confirm cache warming
  4. If session data was lost, expect a brief spike in authentication requests