MATIH Platform is in active MVP development. Documentation reflects current implementation status.
11. Pipelines & Data Engineering
Data Quality
Data Observability

Data Observability

The data observability layer provides end-to-end visibility into data quality through metrics, lineage, tracing, and alerting. It integrates with Prometheus for metrics, OpenLineage for lineage tracking, and the MATIH notification service for alerts.

Source: data-plane/data-quality-service/src/observability/


Observability Components

ComponentModulePurpose
Metrics exportermetrics.pyPrometheus metrics for quality scores and validation results
Lineage emitterlineage.pyOpenLineage events for quality check execution
Alert manageralerts.pyRule-based alerting on quality breaches
Tracingtracing.pyOpenTelemetry spans for profiling and validation runs

Prometheus Metrics

MetricTypeLabelsDescription
dq_validation_totalCounterdataset, rule_type, statusTotal validation executions
dq_validation_duration_secondsHistogramdatasetValidation run duration
dq_score_currentGaugedataset, dimensionCurrent quality score per dimension
dq_anomaly_totalCounterdataset, type, severityAnomalies detected
dq_profile_duration_secondsHistogramdataset, engineProfiling run duration
dq_freshness_age_secondsGaugedatasetCurrent data age in seconds

OpenLineage Integration

Every validation and profiling run emits OpenLineage events for lineage tracking:

{
  "eventType": "COMPLETE",
  "job": {
    "namespace": "matih-data-quality",
    "name": "validate-analytics.sales.transactions"
  },
  "inputs": [
    {
      "namespace": "iceberg",
      "name": "analytics.sales.transactions",
      "facets": {
        "dataQualityMetrics": {
          "overallScore": 0.94,
          "rowCount": 1250000,
          "rulesEvaluated": 12,
          "rulesPassed": 10
        }
      }
    }
  ],
  "run": {
    "runId": "run-abc-123"
  }
}

Alert Rules

Alerts are configured per dataset and dimension:

POST /v1/quality/alerts

Request:
{
  "name": "sales-completeness-alert",
  "dataset": "analytics.sales.transactions",
  "condition": {
    "dimension": "completeness",
    "operator": "less_than",
    "threshold": 0.95
  },
  "severity": "critical",
  "channels": ["slack", "email"],
  "recipients": ["data-engineering-team"],
  "cooldownMinutes": 60
}

Alert Channels

ChannelIntegrationConfiguration
SlackWebhookChannel URL per team
EmailSMTP via notification-serviceDistribution list
PagerDutyEvents APIService key for on-call
KafkaProducermatih.quality.alerts topic

Dashboard Integration

Quality metrics are visualized in Grafana dashboards:

DashboardContent
Data Quality OverviewOverall scores across all datasets
Dataset DetailPer-dimension scores, trends, anomalies
Validation HistoryRule pass/fail rates over time
Freshness MonitorData arrival times and SLA compliance

Incident Workflow

When a critical quality breach is detected:

1. Anomaly/validation failure detected
2. Alert sent to configured channels
3. Pipeline execution paused (if quality gate)
4. On-call engineer investigates root cause
5. Fix applied to source data or pipeline
6. Quality score re-evaluated
7. Pipeline resumes if score meets SLA

Related Pages