MATIH Platform is in active MVP development. Documentation reflects current implementation status.
11. Pipelines & Data Engineering
Data Quality
Quality Scoring

Quality Scoring

The quality scoring system computes a unified quality score for each dataset by aggregating results from six quality dimensions. Scores are tracked over time, compared against SLA thresholds, and used to gate pipeline execution.

Source: data-plane/data-quality-service/src/scoring/calculator.py


Scoring Dimensions

DimensionDefault WeightCalculator
Completeness1.0CompletenessCalculator
Accuracy1.0AccuracyCalculator
Consistency0.8ConsistencyCalculator
Timeliness1.0TimelinessCalculator
Uniqueness0.8UniquenessCalculator
Validity0.9ValidityCalculator

All calculators are defined in data-plane/data-quality-service/src/scoring/dimensions.py.


Score Calculation

The overall score is a weighted average of dimension scores:

overall_score = SUM(dimension_score * weight) / SUM(weight)

Each dimension score ranges from 0.0 (worst) to 1.0 (perfect).

Completeness Score

Measures the proportion of non-null values across required columns:

completeness = 1 - (total_nulls / (row_count * critical_column_count))

Accuracy Score

Measures the proportion of values passing range, pattern, and enum validation rules:

accuracy = passing_values / total_values

Timeliness Score

Measures data freshness against the SLA threshold:

timeliness = 1.0 if (age_hours <= sla_hours) else max(0, 1 - (age_hours - sla_hours) / sla_hours)

Score API

GET /v1/quality/scores?dataset=analytics.sales.transactions

Response:
{
  "dataset": "analytics.sales.transactions",
  "overallScore": 0.94,
  "dimensions": {
    "completeness": {"score": 0.99, "weight": 1.0},
    "accuracy": {"score": 0.97, "weight": 1.0},
    "consistency": {"score": 0.85, "weight": 0.8},
    "timeliness": {"score": 1.0, "weight": 1.0},
    "uniqueness": {"score": 0.92, "weight": 0.8},
    "validity": {"score": 0.88, "weight": 0.9}
  },
  "slaStatus": "COMPLIANT",
  "computedAt": "2026-02-12T06:30:00Z"
}

SLA Compliance

SLA thresholds define minimum acceptable quality scores:

SLA LevelMinimum ScoreBehavior on Breach
Critical0.95Block pipeline, alert on-call
Standard0.80Alert dataset owner
Relaxed0.60Log warning only

SLA Configuration

POST /v1/quality/sla

Request:
{
  "dataset": "analytics.sales.transactions",
  "overallMinScore": 0.90,
  "dimensionMinScores": {
    "completeness": 0.95,
    "accuracy": 0.90,
    "timeliness": 0.99
  }
}

Score Trends

Historical scores are stored for trend analysis:

GET /v1/quality/scores/trends?dataset=analytics.sales.transactions&days=30

The trend API returns daily scores and identifies improving or degrading dimensions.


Pipeline Quality Gates

Pipelines integrate quality scores as execution gates:

quality_checks:
  - name: pre_load_quality
    type: quality_gate
    dataset: analytics.sales.transactions
    min_score: 0.90
    dimensions:
      completeness: 0.95
    severity: critical

Related Pages