MATIH Platform is in active MVP development. Documentation reflects current implementation status.
11. Pipelines & Data Engineering
Data Quality
Validation Rules

Validation Rules

The validation rule engine evaluates data against configurable rules organized by quality dimension. Rules are defined in YAML or via the API, stored in PostgreSQL, and executed by the rule engine during pipeline quality gates.

Source: data-plane/data-quality-service/src/validation/rule_engine.py


Rule Types

Completeness Rules

Rule TypeDescriptionExample
null_checkColumn must not contain NULL valuesamount IS NOT NULL
not_emptyString column must not be emptyname != ''
required_columnsAll specified columns must exist in the dataset[id, name, email]

Accuracy Rules

Rule TypeDescriptionExample
range_checkNumeric value within min/max boundsamount BETWEEN 0 AND 1000000
regex_patternString matches a regular expressionemail LIKE '%@%.%'
enum_valuesValue is one of an allowed setstatus IN ('active', 'inactive')
data_typeColumn matches expected data typecreated_at IS TIMESTAMP

Consistency Rules

Rule TypeDescriptionExample
referential_integrityForeign key exists in reference tablecustomer_id IN customers.id
cross_fieldRelationship between columns holdsend_date >= start_date
aggregate_checkAggregate value meets thresholdSUM(amount) > 0

Uniqueness Rules

Rule TypeDescriptionExample
uniquenessColumn values are uniqueDISTINCT(email) = COUNT(*)
primary_keyComposite key is uniqueUNIQUE(tenant_id, entity_id)
duplicate_checkNo duplicate rows by keyFuzzy dedup by similarity

Timeliness Rules

Rule TypeDescriptionExample
freshnessData is recent relative to SLAMAX(updated_at) > NOW() - 24h
timestamp_validTimestamps are within valid rangecreated_at <= NOW()

Custom Rules

Rule TypeDescription
sql_expressionArbitrary SQL expression evaluated against the dataset
python_functionCustom Python validation function
great_expectationsGreat Expectations expectation suite

Severity Levels

SeverityBehavior
criticalBlocks the pipeline, requires immediate action
warningLogged and alerted, does not block execution
infoLogged only, informational

Rule Definition

POST /v1/quality/rules

Request:
{
  "name": "amount_positive",
  "description": "Transaction amount must be positive",
  "ruleType": "range_check",
  "severity": "critical",
  "status": "active",
  "config": {
    "column": "amount",
    "min": 0.01,
    "max": null
  },
  "datasets": ["analytics.sales.transactions"],
  "tags": ["financial", "critical"]
}

Rule Execution

The rule engine evaluates all active rules for a dataset and returns a validation report:

POST /v1/quality/validate

Request:
{
  "dataset": "analytics.sales.transactions",
  "ruleIds": null,
  "sampleSize": 10000
}

Response:
{
  "dataset": "analytics.sales.transactions",
  "totalRules": 12,
  "passed": 10,
  "failed": 2,
  "results": [
    {
      "ruleId": "rule-123",
      "ruleName": "amount_positive",
      "status": "FAILED",
      "severity": "critical",
      "failedRows": 42,
      "totalRows": 125000,
      "failRate": 0.000336
    }
  ]
}

Related Pages