Flink
Apache Flink provides real-time stream processing for event aggregation, CDC processing, and streaming analytics.
Architecture
+----------------+ +------------------+
| Job Manager | | Task Managers |
| (1 replica) |---->| (2-8 replicas) |
+----------------+ +------------------+
| |
v v
+----------------+ +------------------+
| Kafka | | State Backend |
| (Source/Sink) | | (RocksDB + S3) |
+----------------+ +------------------+Flink SQL Jobs
MATIH defines streaming SQL jobs deployed via K8s manifests:
| Job | Source | Purpose |
|---|---|---|
| agent-performance-agg | matih.ai.agent-traces | Agent performance metrics aggregation |
| llm-operations-agg | matih.ai.llm-ops | LLM cost and latency aggregation |
| session-analytics | matih.ai.state-changes | Session analytics and funnel tracking |
| state-transition-cdc | matih.ai.state-changes | CDC for state transition history |
Checkpointing
checkpointing:
enabled: true
interval: "60s"
minPause: "30s"
timeout: "600s"
backend: "rocksdb"
s3:
endpoint: "http://minio.matih-data-plane.svc.cluster.local:9000"
bucket: "flink-checkpoints"