Verifying the Deployment

After installation and initial configuration, run a comprehensive verification to confirm that the MATIH Platform is fully operational. This page covers the built-in verification tools, manual checks, and troubleshooting procedures for common issues.

Automated Verification Tools

MATIH provides two primary scripts for deployment verification:

Platform Status

The platform-status.sh script checks the status of all platform components:

./scripts/tools/platform-status.sh

This script verifies:

Check	Description
Kubernetes connectivity	Can reach the cluster API server
Namespace existence	`matih-system`, `matih-shared`, and tenant namespaces exist
Pod health	All pods are in `Running` or `Completed` state
Service endpoints	All Kubernetes services have endpoints
Persistent volumes	All PVCs are `Bound`
Ingress status	Ingress resources have assigned addresses

Sample output:

MATIH Platform Status
=====================

Cluster: matih-aks-cluster (Azure AKS)
Kubernetes: v1.29.1

Namespaces:
  matih-system     ............ OK
  matih-shared     ............ OK
  tenant-acme-corp ............ OK

Control Plane Services:
  iam-service          2/2 pods ready .... OK
  tenant-service       2/2 pods ready .... OK
  config-service       1/1 pods ready .... OK
  api-gateway          2/2 pods ready .... OK
  notification-service 1/1 pods ready .... OK
  audit-service        1/1 pods ready .... OK

Data Plane Services (tenant-acme-corp):
  ai-service           2/2 pods ready .... OK
  query-engine         2/2 pods ready .... OK
  bi-service           1/1 pods ready .... OK
  ml-service           1/1 pods ready .... OK

Infrastructure:
  postgresql           1/1 pods ready .... OK
  redis                1/1 pods ready .... OK
  kafka                3/3 pods ready .... OK

Overall Status: HEALTHY

Health Check

The health-check.sh script performs deeper health validation:

./scripts/disaster-recovery/health-check.sh

This script performs:

Check	Description
Service health endpoints	HTTP health check on each service
Database connectivity	Test connection to PostgreSQL
Message broker	Verify Kafka broker availability
Cache	Test Redis connectivity
DNS resolution	Verify internal DNS resolution
Certificate validity	Check TLS certificate expiration

Manual Verification Steps

1. Verify Kubernetes Cluster Health

# Check node status
kubectl get nodes -o wide
 
# Expected: All nodes showing STATUS "Ready"
# NAME            STATUS   ROLES    AGE    VERSION
# node-pool-0     Ready    <none>   2d     v1.29.1
# node-pool-1     Ready    <none>   2d     v1.29.1
# node-pool-2     Ready    <none>   2d     v1.29.1

2. Verify Pod Status

# Check all pods across namespaces
kubectl get pods -A --field-selector=status.phase!=Running,status.phase!=Succeeded
 
# This should return no results (no failing pods)
# If pods are failing, investigate with:
# kubectl describe pod <pod-name> -n <namespace>
# kubectl logs <pod-name> -n <namespace>

3. Verify Control Plane Services

Check each control plane service's health endpoint:

# Using port-forward (if not exposed via ingress)
kubectl port-forward svc/iam-service 8081:8081 -n matih-system &
 
# Check health
curl -s http://localhost:8081/actuator/health | jq .

Expected response:

{
  "status": "UP",
  "components": {
    "db": { "status": "UP" },
    "redis": { "status": "UP" },
    "kafka": { "status": "UP" },
    "diskSpace": { "status": "UP" }
  }
}

4. Verify Data Plane Services

# AI Service health
curl -s http://localhost:8000/health | jq .
 
# Expected:
# {
#   "status": "healthy",
#   "version": "1.0.0",
#   "dependencies": {
#     "database": "connected",
#     "redis": "connected",
#     "kafka": "connected"
#   }
# }

5. Verify Database Connectivity

# Check database pods
kubectl get pods -l app=postgresql -A
 
# Check database service
kubectl get svc -l app=postgresql -A

6. Verify Kafka

# Check Kafka pods
kubectl get pods -l app=kafka -A
 
# Verify topic creation (topics should exist after service deployment)
kubectl exec -it kafka-0 -n matih-shared -- \
  kafka-topics.sh --list --bootstrap-server localhost:9092

7. Verify Ingress and TLS

# Check ingress resources
kubectl get ingress -A
 
# Check certificate status
kubectl get certificates -A
 
# Validate tenant ingress
./scripts/tools/validate-tenant-ingress.sh --tenant acme-corp

Port Validation

Verify that no port conflicts exist:

./scripts/tools/validate-ports.sh

The source of truth for port assignments is scripts/config/components.yaml. This script checks that actual service ports match the defined configuration.

Expected Port Assignments

Control Plane:

Service	Port
api-gateway	8080
iam-service	8081
tenant-service	8082
platform-registry	8084
notification-service	8085
audit-service	8086
billing-service	8087
observability-api	8088
infrastructure-service	8089
config-service	8888

Data Plane:

Service	Port
ai-service	8000
ml-service	8000
data-quality-service	8000
query-engine	8080
bi-service	8084
catalog-service	8086
pipeline-service	8092

Smoke Tests

After verifying infrastructure health, run smoke tests to confirm end-to-end functionality:

Authentication Smoke Test

# Register a test user
curl -s -X POST http://localhost:8081/api/v1/auth/register \
  -H "Content-Type: application/json" \
  -d '{
    "email": "test@acme.com",
    "password": "Test@12345",
    "firstName": "Test",
    "lastName": "User"
  }' | jq .status
 
# Login
curl -s -X POST http://localhost:8081/api/v1/auth/login \
  -H "Content-Type: application/json" \
  -d '{
    "email": "test@acme.com",
    "password": "Test@12345"
  }' | jq .tokenType
 
# Expected: "Bearer"

AI Service Smoke Test

# Test a simple query (requires configured data source)
curl -s -X POST http://localhost:8000/api/v1/query \
  -H "Authorization: Bearer ${ACCESS_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "How many tables are in the database?",
    "tenantId": "acme-corp"
  }' | jq .status
 
# Expected: "success" or "completed"

Troubleshooting Common Issues

Pods in CrashLoopBackOff

Diagnosis:

# Check pod events
kubectl describe pod <pod-name> -n <namespace>
 
# Check container logs
kubectl logs <pod-name> -n <namespace>
 
# Check previous container logs (if restarting)
kubectl logs <pod-name> -n <namespace> --previous

Common causes:

Symptom	Cause	Resolution
"Connection refused" to database	Database not ready or wrong credentials	Check database pod status and secret values
"Invalid JWT secret"	JWT secret not configured	Run `dev-secrets.sh` or check ESO sync
OOMKilled	Insufficient memory	Increase memory limits in Helm values
"Unrecognized option"	Wrong CLI flags for the image version	Verify image tag and flag compatibility

Pods in CreateContainerConfigError

Diagnosis:

kubectl describe pod <pod-name> -n <namespace>
# Look at the Events section for the specific error

Common causes:

Error Message	Cause	Resolution
`secret "X" not found`	Missing Kubernetes secret	Run `dev-secrets.sh` or check ESO ExternalSecret
`configmap "X" not found`	Missing ConfigMap	Check Helm chart templates
`container has runAsNonRoot and image will run as root`	Security context mismatch	Adjust `podSecurityContext` in values

Services Cannot Communicate

Diagnosis:

# Check service endpoints
kubectl get endpoints <service-name> -n <namespace>
 
# Check network policies
kubectl get networkpolicies -n <namespace>
 
# Test connectivity from within a pod
kubectl exec -it <pod-name> -n <namespace> -- \
  curl -s http://<target-service>:<port>/health

Common causes:

Symptom	Cause	Resolution
Empty endpoints	No matching pods for service selector	Check pod labels match service selector
Connection timeout	NetworkPolicy blocking traffic	Review network policy rules
DNS resolution failure	CoreDNS not running	Check `kube-dns` pods in `kube-system`

Database Connection Failures

# Test database connectivity
kubectl exec -it <pod-name> -n <namespace> -- \
  pg_isready -h <db-host> -p 5432 -U matih
 
# Check database secrets
kubectl get secret <secret-name> -n <namespace> -o jsonpath='{.data.password}' | base64 -d

Certificate Issues

# Check certificate status
kubectl get certificates -A
kubectl describe certificate <cert-name> -n <namespace>
 
# Check cert-manager logs
kubectl logs -l app=cert-manager -n cert-manager

Monitoring After Verification

Once verification passes, set up ongoing monitoring:

What to Monitor	Tool	Alert Condition
Pod restarts	Prometheus + Grafana	Any restart in 5 minutes
Service latency	Prometheus + Grafana	P95 above 2 seconds
Error rates	Prometheus + Grafana	Above 1% for 5 minutes
Disk usage	Prometheus + Grafana	Above 80% capacity
Certificate expiry	cert-manager	Within 30 days
Database connections	PostgreSQL metrics	Pool exhaustion

Verification Checklist

Category	Check	Status
Cluster	All nodes Ready
Cluster	No failing pods
Control Plane	IAM service healthy
Control Plane	Tenant service healthy
Control Plane	Config service healthy
Control Plane	API gateway healthy
Data Plane	AI service healthy
Data Plane	Query engine healthy
Infrastructure	PostgreSQL connected
Infrastructure	Redis connected
Infrastructure	Kafka brokers available
Networking	Ingress configured
Networking	TLS certificates valid
Smoke Test	Authentication works
Smoke Test	Query execution works

Next Steps

With the platform verified and operational, proceed to Chapter 5: Quickstart Tutorials for hands-on guides to your first natural language query, dashboard, and ML model.

First-Time Configuration Overview