MATIH Platform is in active MVP development. Documentation reflects current implementation status.
2. Architecture
CDC Patterns

CDC Patterns

Change Data Capture (CDC) enables the MATIH Platform to detect and propagate database changes as events. CDC is used for data synchronization, real-time analytics, and maintaining materialized views in OLAP engines.


CDC Architecture

PostgreSQL (source of truth)
  |
  | WAL (Write-Ahead Log)
  |
  v
Debezium Connector (Kafka Connect)
  |
  | CDC events
  |
  v
Kafka Topic (cdc.{database}.{table})
  |
  +-- Flink: Transform and aggregate
  |     |
  |     v
  |   ClickHouse / StarRocks (OLAP)
  |
  +-- Other consumers: Elasticsearch indexing, cache invalidation

Debezium Configuration

ParameterValue
Connectorio.debezium.connector.postgresql.PostgresConnector
Replication slotLogical replication via pgoutput
Snapshot modeinitial (full snapshot on first start)
Topic routingcdc.{database}.{schema}.{table}
KeyPrimary key of the source table

CDC Event Format

Each CDC event contains the before and after state of the row:

{
  "op": "u",
  "before": {
    "id": 42,
    "tenant_id": "acme-corp",
    "status": "draft",
    "updated_at": "2026-02-12T10:00:00Z"
  },
  "after": {
    "id": 42,
    "tenant_id": "acme-corp",
    "status": "published",
    "updated_at": "2026-02-12T10:30:00Z"
  },
  "source": {
    "db": "matih_bi",
    "schema": "acme_corp",
    "table": "dashboards"
  },
  "ts_ms": 1707735000000
}
Operationop ValueDescription
CreatecNew row inserted
UpdateuExisting row updated
DeletedRow deleted
ReadrInitial snapshot read

CDC Use Cases

Use CaseSourceSinkPurpose
OLAP syncPostgreSQL tablesClickHouse via FlinkKeep OLAP in sync with transactional data
Search indexingPostgreSQL tablesElasticsearchKeep search index up to date
Cache invalidationPostgreSQL tablesRedis (via consumer)Invalidate stale cache entries
Audit enrichmentState changesAudit ServiceCapture before/after state for compliance
Data lineageSchema changesGraph storeTrack schema evolution

Flink CDC Jobs

JobSource TopicSinkTransformation
State transition CDCcdc.matih_*.*.state_changesKafka (processed events)Flatten and enrich state transitions
Agent performance aggregationai.agent.eventsClickHouseAggregate agent metrics per time window
Session analyticscdc.matih_ai.*.sessionsClickHouseSession duration and activity metrics

Multi-Tenancy in CDC

CDC events include the tenant_id from the source table:

LayerIsolation Method
Kafka topicTenant ID as message key
Flink processingFilter by tenant_id in event payload
OLAP sinkTenant ID column in destination table

Related Pages