MATIH Platform is in active MVP development. Documentation reflects current implementation status.
4. Installation & Setup
Verifying the Deployment

Verifying the Deployment

After installation and initial configuration, run a comprehensive verification to confirm that the MATIH Platform is fully operational. This page covers the built-in verification tools, manual checks, and troubleshooting procedures for common issues.


Automated Verification Tools

MATIH provides two primary scripts for deployment verification:

Platform Status

The platform-status.sh script checks the status of all platform components:

./scripts/tools/platform-status.sh

This script verifies:

CheckDescription
Kubernetes connectivityCan reach the cluster API server
Namespace existencematih-system, matih-shared, and tenant namespaces exist
Pod healthAll pods are in Running or Completed state
Service endpointsAll Kubernetes services have endpoints
Persistent volumesAll PVCs are Bound
Ingress statusIngress resources have assigned addresses

Sample output:

MATIH Platform Status
=====================

Cluster: matih-aks-cluster (Azure AKS)
Kubernetes: v1.29.1

Namespaces:
  matih-system     ............ OK
  matih-shared     ............ OK
  tenant-acme-corp ............ OK

Control Plane Services:
  iam-service          2/2 pods ready .... OK
  tenant-service       2/2 pods ready .... OK
  config-service       1/1 pods ready .... OK
  api-gateway          2/2 pods ready .... OK
  notification-service 1/1 pods ready .... OK
  audit-service        1/1 pods ready .... OK

Data Plane Services (tenant-acme-corp):
  ai-service           2/2 pods ready .... OK
  query-engine         2/2 pods ready .... OK
  bi-service           1/1 pods ready .... OK
  ml-service           1/1 pods ready .... OK

Infrastructure:
  postgresql           1/1 pods ready .... OK
  redis                1/1 pods ready .... OK
  kafka                3/3 pods ready .... OK

Overall Status: HEALTHY

Health Check

The health-check.sh script performs deeper health validation:

./scripts/disaster-recovery/health-check.sh

This script performs:

CheckDescription
Service health endpointsHTTP health check on each service
Database connectivityTest connection to PostgreSQL
Message brokerVerify Kafka broker availability
CacheTest Redis connectivity
DNS resolutionVerify internal DNS resolution
Certificate validityCheck TLS certificate expiration

Manual Verification Steps

1. Verify Kubernetes Cluster Health

# Check node status
kubectl get nodes -o wide
 
# Expected: All nodes showing STATUS "Ready"
# NAME            STATUS   ROLES    AGE    VERSION
# node-pool-0     Ready    <none>   2d     v1.29.1
# node-pool-1     Ready    <none>   2d     v1.29.1
# node-pool-2     Ready    <none>   2d     v1.29.1

2. Verify Pod Status

# Check all pods across namespaces
kubectl get pods -A --field-selector=status.phase!=Running,status.phase!=Succeeded
 
# This should return no results (no failing pods)
# If pods are failing, investigate with:
# kubectl describe pod <pod-name> -n <namespace>
# kubectl logs <pod-name> -n <namespace>

3. Verify Control Plane Services

Check each control plane service's health endpoint:

# Using port-forward (if not exposed via ingress)
kubectl port-forward svc/iam-service 8081:8081 -n matih-system &
 
# Check health
curl -s http://localhost:8081/actuator/health | jq .

Expected response:

{
  "status": "UP",
  "components": {
    "db": { "status": "UP" },
    "redis": { "status": "UP" },
    "kafka": { "status": "UP" },
    "diskSpace": { "status": "UP" }
  }
}

4. Verify Data Plane Services

# AI Service health
curl -s http://localhost:8000/health | jq .
 
# Expected:
# {
#   "status": "healthy",
#   "version": "1.0.0",
#   "dependencies": {
#     "database": "connected",
#     "redis": "connected",
#     "kafka": "connected"
#   }
# }

5. Verify Database Connectivity

# Check database pods
kubectl get pods -l app=postgresql -A
 
# Check database service
kubectl get svc -l app=postgresql -A

6. Verify Kafka

# Check Kafka pods
kubectl get pods -l app=kafka -A
 
# Verify topic creation (topics should exist after service deployment)
kubectl exec -it kafka-0 -n matih-shared -- \
  kafka-topics.sh --list --bootstrap-server localhost:9092

7. Verify Ingress and TLS

# Check ingress resources
kubectl get ingress -A
 
# Check certificate status
kubectl get certificates -A
 
# Validate tenant ingress
./scripts/tools/validate-tenant-ingress.sh --tenant acme-corp

Port Validation

Verify that no port conflicts exist:

./scripts/tools/validate-ports.sh

The source of truth for port assignments is scripts/config/components.yaml. This script checks that actual service ports match the defined configuration.

Expected Port Assignments

Control Plane:

ServicePort
api-gateway8080
iam-service8081
tenant-service8082
platform-registry8084
notification-service8085
audit-service8086
billing-service8087
observability-api8088
infrastructure-service8089
config-service8888

Data Plane:

ServicePort
ai-service8000
ml-service8000
data-quality-service8000
query-engine8080
bi-service8084
catalog-service8086
pipeline-service8092

Smoke Tests

After verifying infrastructure health, run smoke tests to confirm end-to-end functionality:

Authentication Smoke Test

# Register a test user
curl -s -X POST http://localhost:8081/api/v1/auth/register \
  -H "Content-Type: application/json" \
  -d '{
    "email": "test@acme.com",
    "password": "Test@12345",
    "firstName": "Test",
    "lastName": "User"
  }' | jq .status
 
# Login
curl -s -X POST http://localhost:8081/api/v1/auth/login \
  -H "Content-Type: application/json" \
  -d '{
    "email": "test@acme.com",
    "password": "Test@12345"
  }' | jq .tokenType
 
# Expected: "Bearer"

AI Service Smoke Test

# Test a simple query (requires configured data source)
curl -s -X POST http://localhost:8000/api/v1/query \
  -H "Authorization: Bearer ${ACCESS_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "How many tables are in the database?",
    "tenantId": "acme-corp"
  }' | jq .status
 
# Expected: "success" or "completed"

Troubleshooting Common Issues

Pods in CrashLoopBackOff

Diagnosis:

# Check pod events
kubectl describe pod <pod-name> -n <namespace>
 
# Check container logs
kubectl logs <pod-name> -n <namespace>
 
# Check previous container logs (if restarting)
kubectl logs <pod-name> -n <namespace> --previous

Common causes:

SymptomCauseResolution
"Connection refused" to databaseDatabase not ready or wrong credentialsCheck database pod status and secret values
"Invalid JWT secret"JWT secret not configuredRun dev-secrets.sh or check ESO sync
OOMKilledInsufficient memoryIncrease memory limits in Helm values
"Unrecognized option"Wrong CLI flags for the image versionVerify image tag and flag compatibility

Pods in CreateContainerConfigError

Diagnosis:

kubectl describe pod <pod-name> -n <namespace>
# Look at the Events section for the specific error

Common causes:

Error MessageCauseResolution
secret "X" not foundMissing Kubernetes secretRun dev-secrets.sh or check ESO ExternalSecret
configmap "X" not foundMissing ConfigMapCheck Helm chart templates
container has runAsNonRoot and image will run as rootSecurity context mismatchAdjust podSecurityContext in values

Services Cannot Communicate

Diagnosis:

# Check service endpoints
kubectl get endpoints <service-name> -n <namespace>
 
# Check network policies
kubectl get networkpolicies -n <namespace>
 
# Test connectivity from within a pod
kubectl exec -it <pod-name> -n <namespace> -- \
  curl -s http://<target-service>:<port>/health

Common causes:

SymptomCauseResolution
Empty endpointsNo matching pods for service selectorCheck pod labels match service selector
Connection timeoutNetworkPolicy blocking trafficReview network policy rules
DNS resolution failureCoreDNS not runningCheck kube-dns pods in kube-system

Database Connection Failures

# Test database connectivity
kubectl exec -it <pod-name> -n <namespace> -- \
  pg_isready -h <db-host> -p 5432 -U matih
 
# Check database secrets
kubectl get secret <secret-name> -n <namespace> -o jsonpath='{.data.password}' | base64 -d

Certificate Issues

# Check certificate status
kubectl get certificates -A
kubectl describe certificate <cert-name> -n <namespace>
 
# Check cert-manager logs
kubectl logs -l app=cert-manager -n cert-manager

Monitoring After Verification

Once verification passes, set up ongoing monitoring:

What to MonitorToolAlert Condition
Pod restartsPrometheus + GrafanaAny restart in 5 minutes
Service latencyPrometheus + GrafanaP95 above 2 seconds
Error ratesPrometheus + GrafanaAbove 1% for 5 minutes
Disk usagePrometheus + GrafanaAbove 80% capacity
Certificate expirycert-managerWithin 30 days
Database connectionsPostgreSQL metricsPool exhaustion

Verification Checklist

CategoryCheckStatus
ClusterAll nodes Ready
ClusterNo failing pods
Control PlaneIAM service healthy
Control PlaneTenant service healthy
Control PlaneConfig service healthy
Control PlaneAPI gateway healthy
Data PlaneAI service healthy
Data PlaneQuery engine healthy
InfrastructurePostgreSQL connected
InfrastructureRedis connected
InfrastructureKafka brokers available
NetworkingIngress configured
NetworkingTLS certificates valid
Smoke TestAuthentication works
Smoke TestQuery execution works

Next Steps

With the platform verified and operational, proceed to Chapter 5: Quickstart Tutorials for hands-on guides to your first natural language query, dashboard, and ML model.