MATIH Platform is in active MVP development. Documentation reflects current implementation status.
10. Data Catalog & Governance
Governance
Sensitive Data Monitoring

Sensitive Data Detection

Sensitive Data Detection in the Data Catalog automatically identifies and classifies data that may contain personally identifiable information (PII), financial data, health records, or other sensitive content. Detection results feed into governance policies for automatic enforcement of access controls, masking, and audit rules.


Classification Levels

The governance system supports classification levels that can be assigned to data entities.

ClassificationDescriptionTypical Content
PUBLICNo restrictions on accessMarketing data, public metrics
INTERNALRestricted to organization membersInternal reports, team metrics
CONFIDENTIALLimited to authorized personnelBusiness strategies, contracts
SENSITIVERequires explicit authorizationFinancial records, HR data
PIIPersonally identifiable informationNames, emails, SSNs, phone numbers
PHIProtected health informationMedical records, diagnoses
PCIPayment card industry dataCredit card numbers, CVVs

Detection Rule Types

Governance policies with CLASSIFICATION type define rules for automatic detection and classification.

Rule TypeDescription
AUTO_CLASSIFYAutomatically classify data based on content patterns
REQUIRES_CLASSIFICATIONEnforce that data must be classified before access
CLASSIFICATION_INHERITANCEPropagate classification from parent to child entities

Pattern-Based Detection

The PATTERN_MATCH rule type identifies sensitive data through regular expression patterns.

Pattern NameDetectsExample Match
EmailEmail addressesuser@example.com
SSNSocial Security Numbers123-45-6789
Credit CardPayment card numbers4111-1111-1111-1111
PhonePhone numbers+1 (555) 123-4567
IP AddressIP addresses192.168.1.1
Date of BirthBirth dates1990-01-15

Example Detection Policy

{
  "name": "PII Auto-Detection",
  "policyType": "CLASSIFICATION",
  "scopeType": "GLOBAL",
  "enforcementMode": "MONITOR",
  "rules": [
    {
      "name": "Require Classification",
      "ruleType": "REQUIRES_CLASSIFICATION",
      "parameters": {},
      "enabled": true,
      "order": 1
    },
    {
      "name": "Email Pattern",
      "ruleType": "PATTERN_MATCH",
      "parameters": {
        "column": "email",
        "pattern": "^[\\w.+-]+@[\\w-]+\\.[\\w.]+$",
        "minMatchPercent": 80.0
      },
      "enabled": true,
      "order": 2
    }
  ],
  "enforcementActions": [
    {
      "actionType": "LOG",
      "parameters": {
        "logLevel": "WARN"
      },
      "order": 1
    },
    {
      "actionType": "NOTIFY",
      "parameters": {
        "recipients": ["data-stewards"],
        "message": "Unclassified sensitive data detected"
      },
      "order": 2
    }
  ]
}

Data Quality Integration

Sensitive data detection integrates with data quality metrics provided through the evaluation context.

MetricDescription
column.null_percentPercentage of null values in the column
column.uniqueness_percentPercentage of unique values
column.pattern_match_percentPercentage of values matching a detection pattern
column.minMinimum value for numeric columns
column.maxMaximum value for numeric columns

Freshness Monitoring

The FRESHNESS rule type monitors data age to ensure sensitive data is current and valid.

ParameterDescription
maxAgeMinutesMaximum allowed age of the data in minutes

When data exceeds the configured freshness threshold, the policy evaluator flags a violation. This is particularly important for sensitive data that must be kept up to date for compliance reasons.


Enforcement on Detection

When sensitive data is detected, the following actions can be triggered automatically.

ActionDescription
MASKApply automatic masking to detected sensitive columns
QUARANTINEQuarantine the data for review before access
ALERTAlert data stewards about the detection
BLOCKBlock access until classification is assigned
WORKFLOWTrigger a classification review workflow

Best Practices

  • Run detection scans regularly on newly ingested data
  • Combine REQUIRES_CLASSIFICATION with AUTO_CLASSIFY for comprehensive coverage
  • Use MONITOR mode during initial deployment to tune detection patterns
  • Review detection results before switching to HARD_ENFORCE mode
  • Maintain a list of known sensitive column name patterns (e.g., ssn, email, phone)