Kafka Topology
Apache Kafka is deployed via the Strimzi Kafka Operator and serves as the primary durable event streaming infrastructure. This section documents the cluster configuration, topic naming conventions, partitioning strategy, and producer/consumer patterns.
Cluster Configuration
| Parameter | Development | Production |
|---|---|---|
| Brokers | 1 | 3 |
| Replication factor | 1 | 3 |
| Min in-sync replicas | 1 | 2 |
| Default partitions | 3 | 6 |
| Retention period | 7 days | 30 days |
| Compression | Snappy | Snappy |
| Max message size | 1 MB | 1 MB |
Topic Naming Convention
Topics follow the pattern {domain}.{entity}.{action}:
| Topic | Partitions | Publisher | Key Consumers |
|---|---|---|---|
tenant.lifecycle.events | 6 | Tenant Service | Audit, Billing, Notification |
query.execution.events | 6 | Query Engine | Audit, Billing, Data Quality |
ai.agent.events | 6 | AI Service | Audit, Billing |
ml.model.events | 3 | ML Service | Audit, Catalog |
pipeline.job.events | 3 | Pipeline Service | Audit, Notification |
data.quality.events | 3 | Data Quality Service | Notification, Governance |
config.change.events | 3 | Config Service | All services |
security.audit.events | 6 | IAM Service | Audit |
governance.policy.events | 3 | Governance Service | Catalog, Query Engine |
billing.usage.events | 3 | Billing Service | Notification |
Partitioning Strategy
All topics use tenant_id as the Kafka message key:
| Property | Benefit |
|---|---|
| Ordering guarantee | Events for a single tenant are ordered within a partition |
| Partition affinity | Same tenant always maps to the same partition |
| Consumer locality | A consumer processes all events for its assigned tenants |
Producer Configuration
The KafkaEventStreamingService configures producers for exactly-once semantics:
| Setting | Value | Purpose |
|---|---|---|
acks | all | Wait for all replica acknowledgment |
retries | 3 | Retry on transient failures |
enable.idempotence | true | Prevent duplicate messages |
max.in.flight.requests | 1 | Preserve ordering during retries |
compression.type | snappy | Reduce network and storage overhead |
Consumer Groups
| Consumer Group | Services | Topics |
|---|---|---|
audit-consumers | Audit Service | All event topics |
billing-consumers | Billing Service | query, ai, ml, pipeline, tenant topics |
notification-consumers | Notification Service | tenant, pipeline, quality, billing topics |
data-quality-consumers | Data Quality Service | query topics |
governance-consumers | Governance Service | quality, catalog topics |
Monitoring
| Metric | Alert Threshold |
|---|---|
| Consumer lag | More than 1000 messages |
| Failed events | More than 1% of consumed events |
| Publish latency (p95) | More than 500ms |
| Under-replicated partitions | Any |
Related Pages
- Event Schemas -- Event structure and categories
- Redis Pub/Sub -- Complementary real-time messaging
- Data Stores: Kafka -- Kafka infrastructure details