CDC Patterns
Change Data Capture (CDC) enables the MATIH Platform to detect and propagate database changes as events. CDC is used for data synchronization, real-time analytics, and maintaining materialized views in OLAP engines.
CDC Architecture
PostgreSQL (source of truth)
|
| WAL (Write-Ahead Log)
|
v
Debezium Connector (Kafka Connect)
|
| CDC events
|
v
Kafka Topic (cdc.{database}.{table})
|
+-- Flink: Transform and aggregate
| |
| v
| ClickHouse / StarRocks (OLAP)
|
+-- Other consumers: Elasticsearch indexing, cache invalidationDebezium Configuration
| Parameter | Value |
|---|---|
| Connector | io.debezium.connector.postgresql.PostgresConnector |
| Replication slot | Logical replication via pgoutput |
| Snapshot mode | initial (full snapshot on first start) |
| Topic routing | cdc.{database}.{schema}.{table} |
| Key | Primary key of the source table |
CDC Event Format
Each CDC event contains the before and after state of the row:
{
"op": "u",
"before": {
"id": 42,
"tenant_id": "acme-corp",
"status": "draft",
"updated_at": "2026-02-12T10:00:00Z"
},
"after": {
"id": 42,
"tenant_id": "acme-corp",
"status": "published",
"updated_at": "2026-02-12T10:30:00Z"
},
"source": {
"db": "matih_bi",
"schema": "acme_corp",
"table": "dashboards"
},
"ts_ms": 1707735000000
}| Operation | op Value | Description |
|---|---|---|
| Create | c | New row inserted |
| Update | u | Existing row updated |
| Delete | d | Row deleted |
| Read | r | Initial snapshot read |
CDC Use Cases
| Use Case | Source | Sink | Purpose |
|---|---|---|---|
| OLAP sync | PostgreSQL tables | ClickHouse via Flink | Keep OLAP in sync with transactional data |
| Search indexing | PostgreSQL tables | Elasticsearch | Keep search index up to date |
| Cache invalidation | PostgreSQL tables | Redis (via consumer) | Invalidate stale cache entries |
| Audit enrichment | State changes | Audit Service | Capture before/after state for compliance |
| Data lineage | Schema changes | Graph store | Track schema evolution |
Flink CDC Jobs
| Job | Source Topic | Sink | Transformation |
|---|---|---|---|
| State transition CDC | cdc.matih_*.*.state_changes | Kafka (processed events) | Flatten and enrich state transitions |
| Agent performance aggregation | ai.agent.events | ClickHouse | Aggregate agent metrics per time window |
| Session analytics | cdc.matih_ai.*.sessions | ClickHouse | Session duration and activity metrics |
Multi-Tenancy in CDC
CDC events include the tenant_id from the source table:
| Layer | Isolation Method |
|---|---|
| Kafka topic | Tenant ID as message key |
| Flink processing | Filter by tenant_id in event payload |
| OLAP sink | Tenant ID column in destination table |
Related Pages
- Kafka Topology -- Kafka topic design for CDC
- Event Schemas -- Event structure conventions
- OLAP Engines -- Destination for CDC data