Kafka
Apache Kafka provides durable event streaming for asynchronous communication between services. Deployed via the Strimzi Kafka Operator, Kafka handles event notification, cross-service coordination, audit event ingestion, and billing usage tracking. Ten services produce or consume Kafka events.
Cluster Configuration
| Parameter | Development | Production |
|---|---|---|
| Deployment | Strimzi Kafka Operator | Strimzi Kafka Operator |
| Brokers | 1 | 3 |
| Replication factor | 1 | 3 |
| Min in-sync replicas | 1 | 2 |
| Retention period | 7 days | 30 days |
| Compression | Snappy | Snappy |
| Max message size | 1 MB | 1 MB |
Topic Design
Topics follow the naming convention {domain}.{entity}.{action}:
| Topic | Publisher | Consumers |
|---|---|---|
tenant.lifecycle.events | Tenant Service | Audit, Billing, Notification |
query.execution.events | Query Engine | Audit, Billing, Data Quality |
ai.agent.events | AI Service | Audit, Billing |
ml.model.events | ML Service | Audit, Catalog |
pipeline.job.events | Pipeline Service | Audit, Notification |
data.quality.events | Data Quality Service | Notification, Governance |
billing.usage.events | Billing Service | Notification |
config.change.events | Config Service | All services |
security.audit.events | IAM Service | Audit |
governance.policy.events | Governance Service | Catalog, Query Engine |
Partitioning Strategy
All topics use tenant_id as the partition key:
| Property | Guarantee |
|---|---|
| Ordering | All events for a single tenant arrive in order |
| Affinity | A consumer always processes the same tenant's partition |
| Scalability | Adding partitions distributes tenant load |
Producer Configuration
acks = all (wait for all replicas)
retries = 3 (retry on failure)
enable.idempotence = true (prevent duplicate messages)
max.in.flight.requests = 1 (preserve ordering on retry)
compression.type = snappy (reduce message size)Consumer Configuration
| Parameter | Value |
|---|---|
| Auto offset reset | earliest |
| Max poll records | 100 |
| Poll interval | 100ms |
| Offset commit | Synchronous after each batch |
| Error handling | Log and skip (increment failure counter) |
Related Pages
- Event-Driven Architecture -- Full event system documentation
- Redis -- Complementary Pub/Sub system
- Service Topology -- Event flows between services