MATIH Platform is in active MVP development. Documentation reflects current implementation status.
10. Data Catalog & Governance
Classification

Tags and Classification

Data classification is the foundation of the MATIH security and governance model. Every column, table, and data asset can carry classification tags that drive data masking rules, access policies, retention schedules, and compliance controls. This section covers the classification taxonomy, automatic PII detection, tag management, and the classification rules engine.


Classification Taxonomy

MATIH uses a multi-dimensional classification system:

Sensitivity Levels

LevelLabelDescriptionExample Data
0PUBLICNo restrictions on accessProduct names, public company info
1INTERNALInternal use onlyEmployee names, department structures
2CONFIDENTIALBusiness-sensitiveRevenue figures, customer counts
3RESTRICTEDPersonally Identifiable InformationEmail, phone, date of birth
4SECRETHighly sensitiveSSN, financial account numbers, passwords

PII Categories

CategoryTagExample Columns
Email addressPII:EMAILemail, contact_email, user_email
Phone numberPII:PHONEphone, mobile, contact_phone
National IDPII:NATIONAL_IDssn, sin, national_id
Date of birthPII:DOBdate_of_birth, dob, birth_date
Physical addressPII:ADDRESSaddress, street, zip_code
Financial accountPII:FINANCIALaccount_number, iban, routing_number
Health informationPHI:MEDICALdiagnosis, prescription, medical_record
Payment cardPCI:CARDcard_number, cvv, expiry_date
IP addressPII:IPip_address, client_ip, source_ip
Biometric dataPII:BIOMETRICfingerprint_hash, face_encoding

Business Domain Tags

DomainTags
Financefinance, revenue, billing, payment
Salessales, orders, customers, pipeline
Marketingmarketing, campaigns, leads, attribution
Engineeringengineering, metrics, logs, infrastructure
HRhr, employees, compensation, recruitment

Automatic PII Detection

The PiiDetectionService automatically scans columns for PII patterns:

@Service
public class PiiDetectionService {
 
    public List<PiiDetectionResult> detectPii(UUID tenantId, String tableFqn, int sampleSize) {
        // 1. Fetch sample data from the table (default: 1000 rows)
        // 2. For each column, apply pattern matching
        // 3. Score confidence for each PII type
        // 4. Return results above confidence threshold
    }
}

Detection Methods

MethodApproachAccuracy
Column name heuristicsMatch column names against known PII patternsHigh for standard names
Regex pattern matchingApply regex patterns to sample dataHigh for structured PII (SSN, email)
Statistical analysisAnalyze value distributions for PII characteristicsMedium for unstructured PII
Data type analysisCorrelate column type with PII likelihoodLow (supplementary signal)

Detection Patterns

// Email detection
Pattern.compile("[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}")
 
// SSN detection (US)
Pattern.compile("\\d{3}-\\d{2}-\\d{4}")
 
// Phone detection (US)
Pattern.compile("\\(?\\d{3}\\)?[-.\\s]?\\d{3}[-.\\s]?\\d{4}")
 
// Credit card detection (Luhn-valid 13-19 digit numbers)
Pattern.compile("\\d{4}[- ]?\\d{4}[- ]?\\d{4}[- ]?\\d{1,7}")
 
// IP address detection
Pattern.compile("\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}")

Detection Confidence Scoring

Each detection produces a confidence score:

Score RangeInterpretationAction
0.95 - 1.00Definite PIIAuto-classify and apply masking
0.80 - 0.95Likely PIIAuto-classify, flag for review
0.50 - 0.80Possible PIIFlag for manual review
0.00 - 0.50Unlikely PIINo action

PII Detection API

POST /v1/catalog/classification/detect-pii

Request:
{
  "tableFqn": "analytics.public.customers",
  "sampleSize": 1000,
  "autoClassify": false
}

Response:
{
  "tableFqn": "analytics.public.customers",
  "detectedAt": "2026-02-12T10:30:00Z",
  "results": [
    {
      "column": "email",
      "piiType": "PII:EMAIL",
      "confidence": 0.99,
      "sampleMatches": 987,
      "sampleSize": 1000,
      "currentClassification": null,
      "suggestedClassification": "RESTRICTED",
      "suggestedTags": ["PII", "PII:EMAIL"]
    },
    {
      "column": "ssn",
      "piiType": "PII:NATIONAL_ID",
      "confidence": 0.97,
      "sampleMatches": 965,
      "sampleSize": 1000,
      "currentClassification": null,
      "suggestedClassification": "SECRET",
      "suggestedTags": ["PII", "PII:NATIONAL_ID"]
    },
    {
      "column": "notes",
      "piiType": "PII:EMAIL",
      "confidence": 0.35,
      "sampleMatches": 45,
      "sampleSize": 1000,
      "currentClassification": null,
      "suggestedClassification": null,
      "suggestedTags": []
    }
  ]
}

Classification Rules Engine

The ClassificationRulesEngine applies rule-based classification to data assets:

@Service
public class ClassificationRulesEngine {
 
    public List<ClassificationResult> applyRules(UUID tenantId, String tableFqn) {
        // 1. Fetch column metadata from catalog
        // 2. Apply name-based rules
        // 3. Apply type-based rules
        // 4. Apply PII detection rules
        // 5. Apply custom tenant rules
        // 6. Merge and resolve conflicts (highest sensitivity wins)
    }
}

Rule Types

Rule TypeInputDescription
Name patternColumn nameClassify based on column name matching regex
Data typeColumn typeClassify based on SQL data type
PII detectionSample dataClassify based on PII detection results
Table patternTable nameClassify all columns in matching tables
Custom expressionColumn metadataTenant-defined rules with custom logic
InheritanceTable/schema tagPropagate classification from parent to child

Rule Configuration

{
  "rules": [
    {
      "name": "email-columns",
      "type": "NAME_PATTERN",
      "pattern": "(?i)(email|e_mail|email_address|contact_email)",
      "classification": "RESTRICTED",
      "tags": ["PII", "PII:EMAIL"],
      "priority": 100
    },
    {
      "name": "ssn-columns",
      "type": "NAME_PATTERN",
      "pattern": "(?i)(ssn|social_security|sin|national_id)",
      "classification": "SECRET",
      "tags": ["PII", "PII:NATIONAL_ID"],
      "priority": 200
    },
    {
      "name": "financial-tables",
      "type": "TABLE_PATTERN",
      "pattern": "(?i)(transactions|payments|invoices|billing)",
      "classification": "CONFIDENTIAL",
      "tags": ["finance"],
      "priority": 50
    }
  ]
}

Tag Management

Applying Tags

POST /v1/catalog/tags

Request:
{
  "entityType": "COLUMN",
  "entityFqn": "analytics.public.customers.email",
  "tags": ["PII", "PII:EMAIL"],
  "classification": "RESTRICTED",
  "appliedBy": "auto-classification"
}

Listing Tags

GET /v1/catalog/tags?entityType=TABLE&entityFqn=analytics.public.customers

Response:
{
  "entityFqn": "analytics.public.customers",
  "tags": [
    {"tag": "customer-data", "source": "manual", "appliedBy": "data-steward", "appliedAt": "2026-01-15"},
    {"tag": "PII-containing", "source": "auto-classification", "appliedBy": "system", "appliedAt": "2026-02-01"}
  ],
  "columns": [
    {
      "column": "email",
      "classification": "RESTRICTED",
      "tags": ["PII", "PII:EMAIL"],
      "source": "auto-classification"
    },
    {
      "column": "ssn",
      "classification": "SECRET",
      "tags": ["PII", "PII:NATIONAL_ID"],
      "source": "auto-classification"
    }
  ]
}

Tag Propagation

Classification tags propagate through the governance system:

Classification applied to column
    |
    v
[Catalog Service] -- CatalogEvent --> [Query Engine]
    |                                      |
    |                                 Update masking rules
    |
    v
[Governance Service] -- Policy update --> [OPA]
    |
    |
Update access policies

Classification and Downstream Effects

Classification TagQuery Engine EffectGovernance EffectQuality Effect
PUBLICNo maskingNo access restrictionStandard validation
INTERNALNo masking for internal usersInternal role requiredStandard validation
CONFIDENTIALPartial masking for viewersDepartment match requiredEnhanced monitoring
RESTRICTEDHeavy masking for non-stewardsData steward or admin requiredPII monitoring rules
SECRETFull redaction for non-adminsAdmin only, audit trailCritical validation rules
PII:EMAILEmail masking functionGDPR controls applyEmail format validation
PII:FINANCIALAccount number maskingPCI DSS controls applyFinancial integrity checks

Related Sections