Namespace Topology
The MATIH platform organizes its Kubernetes resources across seven dedicated namespaces, each serving a distinct operational purpose. This isolation strategy provides security boundaries, resource quota enforcement, network segmentation, and clear ownership of workloads. This section details each namespace, its purpose, the resources it contains, RBAC policies, resource quotas, and inter-namespace communication patterns.
Namespace Overview
+---------------------------------------------------------------+
| MATIH Kubernetes Cluster |
| |
| +-- matih-system -------------------------+ |
| | Cluster-wide infrastructure: | |
| | cert-manager, external-dns, ESO, | |
| | Strimzi operator, KEDA | |
| +-----------------------------------------+ |
| |
| +-- matih-control-plane ------------------+ |
| | IAM, Tenant, Config, Audit, | |
| | Notification, Billing, API Gateway, | |
| | Platform Registry, Infrastructure Svc | |
| | PostgreSQL (shared), Redis, Kafka | |
| +-----------------------------------------+ |
| |
| +-- matih-data-plane ---------------------+ |
| | AI Service, BI Service, ML Service, | |
| | Query Engine, Catalog, Pipeline, | |
| | Semantic Layer, Data Plane Agent, | |
| | Data Quality, Render, Ops Agent, | |
| | Ontology, Governance | |
| | + Trino, Kafka, PostgreSQL, Redis, | |
| | Qdrant, Neo4j, Dgraph, StarRocks, | |
| | Elasticsearch, MongoDB, ChromaDB | |
| +-----------------------------------------+ |
| |
| +-- matih-observability ------------------+ |
| | Prometheus, Grafana, Loki, Promtail, | |
| | Tempo, Observability API | |
| +-----------------------------------------+ |
| |
| +-- matih-monitoring-control-plane -------+ |
| | ServiceMonitors for control plane | |
| | services, alert rules | |
| +-----------------------------------------+ |
| |
| +-- matih-monitoring-data-plane ----------+ |
| | ServiceMonitors for data plane | |
| | services, alert rules | |
| +-----------------------------------------+ |
| |
| +-- matih-frontend -----------------------+ |
| | BI Workbench, ML Workbench, | |
| | Data Workbench, Agentic Workbench, | |
| | Control Plane UI, Data Plane UI | |
| +-----------------------------------------+ |
+---------------------------------------------------------------+Namespace Labels
Every MATIH namespace carries standard labels that are referenced by NetworkPolicies, RBAC policies, and monitoring configurations:
# Standard namespace labels
apiVersion: v1
kind: Namespace
metadata:
name: matih-data-plane
labels:
name: matih-data-plane
app.kubernetes.io/part-of: matih-platform
matih.ai/tier: data-plane
matih.ai/environment: production
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/warn: restrictedThe name label is particularly important because it is referenced in NetworkPolicy namespaceSelector rules to control inter-namespace traffic.
| Label | Purpose |
|---|---|
name | Namespace identification for NetworkPolicy selectors |
app.kubernetes.io/part-of | Platform grouping |
matih.ai/tier | Architectural tier (system, control-plane, data-plane, observability, frontend) |
matih.ai/environment | Deployment environment (dev, staging, production) |
pod-security.kubernetes.io/* | Pod Security Standards enforcement level |
Detailed Namespace Descriptions
1. matih-system
Purpose: Cluster-wide infrastructure components and operators that serve multiple namespaces.
Resources:
| Resource | Type | Description |
|---|---|---|
| cert-manager | Deployment | TLS certificate provisioning via Let's Encrypt |
| external-dns | Deployment | Automatic DNS record management |
| External Secrets Operator | Deployment | Secret synchronization from cloud vault |
| Strimzi Kafka Operator | Deployment | Kafka cluster lifecycle management |
| KEDA | Deployment | Event-driven pod autoscaling |
| matih-operator | Deployment | Custom platform operator for tenant lifecycle |
apiVersion: v1
kind: Namespace
metadata:
name: matih-system
labels:
name: matih-system
app.kubernetes.io/part-of: matih-platform
matih.ai/tier: system2. matih-control-plane
Purpose: All control plane microservices that manage the multi-tenant platform: identity, tenant lifecycle, configuration, auditing, and notifications.
Services Deployed:
| Service | Port | Technology | Replicas |
|---|---|---|---|
| iam-service | 8081 | Java/Spring Boot | 2 |
| tenant-service | 8082 | Java/Spring Boot | 2 |
| config-service | 8888 | Java/Spring Boot | 2 |
| notification-service | 8085 | Java/Spring Boot | 2 |
| audit-service | 8086 | Java/Spring Boot | 2 |
| billing-service | 8087 | Java/Spring Boot | 2 |
| observability-api | 8088 | Java/Spring Boot | 2 |
| infrastructure-service | 8089 | Java/Spring Boot | 2 |
| api-gateway | 8080 | Java/Spring Boot | 2 |
| platform-registry | 8084 | Java/Spring Boot | 2 |
Shared Infrastructure (within namespace):
| Component | Purpose | Notes |
|---|---|---|
| PostgreSQL | Relational storage | Bitnami chart, shared by CP services |
| Redis | Caching, sessions | Bitnami chart, shared by CP services |
| Kafka | Event messaging | Bitnami chart for CP-internal events |
apiVersion: v1
kind: Namespace
metadata:
name: matih-control-plane
labels:
name: matih-control-plane
app.kubernetes.io/part-of: matih-platform
matih.ai/tier: control-plane3. matih-data-plane
Purpose: All data plane microservices and data infrastructure components that power analytics, AI, ML, and data processing.
Application Services:
| Service | Port | Technology | Replicas |
|---|---|---|---|
| ai-service | 8000 | Python/FastAPI | 2 |
| bi-service | 8084 | Java/Spring Boot | 2 |
| ml-service | 8000 | Python/FastAPI | 2 |
| query-engine | 8080 | Java/Spring Boot | 2 |
| catalog-service | 8086 | Java/Spring Boot | 2 |
| pipeline-service | 8092 | Java/Spring Boot | 2 |
| semantic-layer | 8086 | Java/Spring Boot | 2 |
| data-plane-agent | 8085 | Java/Spring Boot | 2 |
| data-quality-service | 8000 | Python/FastAPI | 2 |
| render-service | 8098 | Node.js | 2 |
| ontology-service | 8101 | Python/FastAPI | 2 |
| governance-service | 8080 | Python/FastAPI | 2 |
| ops-agent-service | 8080 | Python/FastAPI | 2 |
Data Infrastructure:
| Component | Ports | Technology | Deployment Type |
|---|---|---|---|
| Trino | 8080 | Distributed SQL | StatefulSet (coordinator + workers) |
| Kafka (Strimzi) | 9092, 9093 | Event streaming | Strimzi CRD (KafkaCluster) |
| PostgreSQL | 5432 | Relational DB | StatefulSet (Bitnami) |
| Redis | 6379 | Cache/messaging | StatefulSet (Bitnami) |
| Qdrant | 6333, 6334 | Vector search | StatefulSet |
| Neo4j | 7474, 7687 | Graph database | StatefulSet |
| Dgraph | 8080, 9080 | Graph database | StatefulSet |
| StarRocks | 9030, 8030 | OLAP database | StatefulSet |
| Elasticsearch | 9200, 9300 | Full-text search | StatefulSet |
| MongoDB | 27017 | Document store | StatefulSet |
| ChromaDB | 8000 | Vector embeddings | Deployment |
4. matih-observability
Purpose: Centralized observability stack providing metrics, logging, tracing, and alerting.
| Component | Port | Purpose |
|---|---|---|
| Prometheus Server | 9090 | Metrics collection and storage |
| Alertmanager | 9093 | Alert routing and deduplication |
| Grafana | 3000 | Dashboard visualization |
| Loki | 3100 | Log aggregation |
| Promtail | N/A | Log collection (DaemonSet) |
| Tempo | 3200 | Distributed trace storage |
| Observability API | 8086 | Unified observability query API |
5. matih-monitoring-control-plane
Purpose: Dedicated monitoring resources for control plane services. This namespace hosts ServiceMonitor CRDs that instruct Prometheus to scrape metrics from control plane pods.
| Resource Type | Count | Target |
|---|---|---|
| ServiceMonitor | 10 | One per control plane service |
| PrometheusRule | 5+ | Alert rules for CP services |
Separating monitoring resources from the service namespace allows platform engineers to manage monitoring without granting access to application resources.
6. matih-monitoring-data-plane
Purpose: Dedicated monitoring resources for data plane services, following the same pattern as the control plane monitoring namespace.
| Resource Type | Count | Target |
|---|---|---|
| ServiceMonitor | 14+ | One per data plane service |
| PrometheusRule | 10+ | Alert rules for DP services |
| PodMonitor | 3+ | Data infrastructure pods |
7. matih-frontend
Purpose: Frontend applications served as static files behind NGINX reverse proxies.
| Application | Port | Technology |
|---|---|---|
| bi-workbench | 3000 | React/Vite |
| ml-workbench | 3001 | React/Vite |
| data-workbench | 3002 | React/Vite |
| agentic-workbench | 3003 | React/Vite |
| control-plane-ui | 3004 | React/Vite |
| data-plane-ui | 3005 | React/Vite |
RBAC Configuration
MATIH implements fine-grained RBAC using Kubernetes Roles, ClusterRoles, RoleBindings, and ClusterRoleBindings.
RBAC Hierarchy
ClusterRoles (platform-wide)
|
+-- matih-platform-admin Full access to all namespaces
+-- matih-platform-viewer Read-only access to all namespaces
+-- matih-namespace-admin Admin within a specific namespace
|
Roles (namespace-scoped)
|
+-- matih-service-deployer Deploy/update services in namespace
+-- matih-service-viewer Read pods, services, configmaps
+-- matih-secret-reader Read secrets (for operators only)ClusterRole Definitions
# Platform administrator - full access
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: matih-platform-admin
rules:
- apiGroups: ["*"]
resources: ["*"]
verbs: ["*"]
---
# Platform viewer - read-only everywhere
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: matih-platform-viewer
rules:
- apiGroups: ["", "apps", "batch", "networking.k8s.io"]
resources: ["pods", "services", "deployments", "statefulsets",
"jobs", "configmaps", "ingresses", "networkpolicies"]
verbs: ["get", "list", "watch"]
- apiGroups: ["monitoring.coreos.com"]
resources: ["servicemonitors", "prometheusrules"]
verbs: ["get", "list", "watch"]Namespace-Scoped Roles
# Service deployer - can deploy and update services in a namespace
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: matih-service-deployer
namespace: matih-data-plane
rules:
- apiGroups: ["apps"]
resources: ["deployments", "statefulsets"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
- apiGroups: [""]
resources: ["services", "configmaps", "serviceaccounts"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
- apiGroups: [""]
resources: ["pods", "pods/log"]
verbs: ["get", "list", "watch"]
- apiGroups: ["autoscaling"]
resources: ["horizontalpodautoscalers"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
- apiGroups: ["policy"]
resources: ["poddisruptionbudgets"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
- apiGroups: ["networking.k8s.io"]
resources: ["networkpolicies"]
verbs: ["get", "list", "watch", "create", "update", "patch"]Service Account Bindings
Each service has a dedicated Kubernetes service account:
apiVersion: v1
kind: ServiceAccount
metadata:
name: ai-service
namespace: matih-data-plane
labels:
app.kubernetes.io/name: ai-service
app.kubernetes.io/part-of: matih-platform
annotations:
# Azure Workload Identity
azure.workload.identity/client-id: "<managed-identity-client-id>"Resource Quotas
Each namespace has resource quotas to prevent runaway resource consumption:
Control Plane Quotas
apiVersion: v1
kind: ResourceQuota
metadata:
name: matih-control-plane-quota
namespace: matih-control-plane
spec:
hard:
requests.cpu: "20"
requests.memory: 40Gi
limits.cpu: "40"
limits.memory: 80Gi
pods: "100"
services: "20"
persistentvolumeclaims: "20"
secrets: "50"
configmaps: "50"Data Plane Quotas
apiVersion: v1
kind: ResourceQuota
metadata:
name: matih-data-plane-quota
namespace: matih-data-plane
spec:
hard:
requests.cpu: "80"
requests.memory: 160Gi
limits.cpu: "160"
limits.memory: 320Gi
pods: "300"
services: "40"
persistentvolumeclaims: "50"
secrets: "100"
configmaps: "100"
# GPU quotas
requests.nvidia.com/gpu: "4"Quota Summary by Namespace
| Namespace | CPU Requests | CPU Limits | Memory Requests | Memory Limits | Max Pods |
|---|---|---|---|---|---|
| matih-system | 10 | 20 | 20Gi | 40Gi | 50 |
| matih-control-plane | 20 | 40 | 40Gi | 80Gi | 100 |
| matih-data-plane | 80 | 160 | 160Gi | 320Gi | 300 |
| matih-observability | 20 | 40 | 40Gi | 80Gi | 50 |
| matih-monitoring-* | 5 | 10 | 10Gi | 20Gi | 30 |
| matih-frontend | 10 | 20 | 10Gi | 20Gi | 50 |
LimitRange Defaults
LimitRanges ensure every container has sensible default resource settings:
apiVersion: v1
kind: LimitRange
metadata:
name: matih-default-limits
namespace: matih-data-plane
spec:
limits:
- type: Container
default:
cpu: 500m
memory: 512Mi
defaultRequest:
cpu: 100m
memory: 128Mi
max:
cpu: "8"
memory: 32Gi
min:
cpu: 50m
memory: 64Mi
- type: PersistentVolumeClaim
max:
storage: 500Gi
min:
storage: 1GiInter-Namespace Communication
Services communicate across namespace boundaries using Kubernetes DNS FQDN syntax:
<service-name>.<namespace>.svc.cluster.local:<port>Communication Matrix
| Source Namespace | Target Namespace | Services | Ports | Purpose |
|---|---|---|---|---|
| matih-data-plane | matih-control-plane | iam-service | 8081 | JWT validation, user lookup |
| matih-data-plane | matih-data-plane | query-engine, catalog-service, semantic-layer | 8080, 8086 | Query execution, metadata |
| matih-control-plane | matih-data-plane | data-plane-agent | 8085 | Tenant provisioning |
| matih-frontend | matih-control-plane | api-gateway | 8080 | API requests |
| matih-observability | matih-control-plane | All services | Various | Prometheus scraping |
| matih-observability | matih-data-plane | All services | Various | Prometheus scraping |
| matih-data-plane | External | LLM APIs | 443 | OpenAI, Anthropic, Azure |
Cross-Namespace Service Discovery Example
In the AI service values.yaml, cross-namespace references use FQDNs:
config:
services:
# Same namespace (matih-data-plane)
queryEngineUrl: "http://query-engine.matih-data-plane.svc.cluster.local:8080"
semanticLayerUrl: "http://semantic-layer.matih-data-plane.svc.cluster.local:8086"
catalogServiceUrl: "http://catalog-service.matih-data-plane.svc.cluster.local:8086"
# Cross-namespace (matih-control-plane)
iamServiceUrl: "http://iam-service.matih-control-plane.svc.cluster.local:8081"Deep Dive: Why FQDNs instead of short names? While services within the same namespace can use short names (e.g.,
query-engine:8080), MATIH always uses the fully qualified domain name for clarity and to avoid DNS resolution ambiguity. This is especially important when services reference resources across namespaces, as short names would resolve within the source namespace and fail.
Pod Security Standards
MATIH enforces the Kubernetes Pod Security Standards at the namespace level:
| Namespace | Enforce Level | Audit Level | Warn Level |
|---|---|---|---|
| matih-system | baseline | restricted | restricted |
| matih-control-plane | restricted | restricted | restricted |
| matih-data-plane | restricted | restricted | restricted |
| matih-observability | baseline | restricted | restricted |
| matih-monitoring-* | restricted | restricted | restricted |
| matih-frontend | restricted | restricted | restricted |
The restricted profile enforces:
- Containers must run as non-root
- Containers must not use privilege escalation
- Containers must drop all capabilities
- Containers must use a read-only root filesystem (with exceptions via emptyDir)
- Seccomp profile must be RuntimeDefault or Localhost
The matih-system and matih-observability namespaces use baseline enforcement because some infrastructure components (cert-manager, Promtail DaemonSet) require capabilities that restricted does not permit.
Namespace Creation Script
Namespaces are created as part of the CD pipeline (stage 01-namespaces):
#!/usr/bin/env bash
# scripts/cd/stages/01-namespaces.sh
NAMESPACES=(
"matih-system"
"matih-control-plane"
"matih-data-plane"
"matih-observability"
"matih-monitoring-control-plane"
"matih-monitoring-data-plane"
"matih-frontend"
)
for ns in "${NAMESPACES[@]}"; do
# Apply namespace with labels
kubectl apply -f "infrastructure/k8s/namespaces/${ns}.yaml"
# Apply resource quotas
kubectl apply -f "infrastructure/k8s/quotas/${ns}-quota.yaml"
# Apply limit ranges
kubectl apply -f "infrastructure/k8s/limits/${ns}-limits.yaml"
doneTroubleshooting
Common Namespace Issues
| Issue | Symptom | Resolution |
|---|---|---|
| Namespace stuck in Terminating | kubectl get ns shows Terminating | Check for finalizers; remove if resources are cleaned up |
| Quota exceeded | Pods stuck in Pending with "exceeded quota" event | Increase quota or reduce resource requests |
| Cross-namespace DNS failure | Service calls return connection refused | Verify FQDN format; check NetworkPolicy allows egress to target namespace |
| RBAC permission denied | 403 errors in pod logs | Check RoleBinding; verify service account has correct Role |
| Pod Security violation | Pod rejected with "violates PodSecurity" | Adjust securityContext to meet restricted profile |
Next Steps
- Next: Helm Chart Structure
- Previous: Cluster Architecture