Data Masking
Data masking protects sensitive column data by transforming column values in query results based on the user's role and the column's classification level. Unlike Row-Level Security (which restricts which rows a user can see), data masking restricts how column values are displayed, allowing users to see the existence of data without revealing its actual content.
Masking Architecture
Data masking operates between the RLS injection phase and the engine execution phase:
SQL Query
|
v
[RLS Filter Injection] -- Row-level restrictions
|
v
[Data Masking Service] -- Column-level transformations
|
v
[Engine Execution] -- Executes modified SQL
|
v
[Masked Results] -- User sees transformed valuesThe masking service modifies the SELECT clause of SQL queries, replacing column references with masking function expressions. This approach ensures masking is applied at the database level rather than post-processing, preventing raw data from ever reaching the application layer.
Classification-Driven Masking
Masking rules are driven by data classification tags assigned in the Catalog Service. Each column can carry a classification tag that determines the masking strategy:
| Classification | Level | Description | Default Masking |
|---|---|---|---|
PUBLIC | 0 | No restrictions | None (visible to all) |
INTERNAL | 1 | Internal use only | None for internal users; redacted for external |
CONFIDENTIAL | 2 | Business-sensitive data | Partial masking |
RESTRICTED | 3 | Personally Identifiable Information (PII) | Heavy masking or hashing |
SECRET | 4 | Credentials, financial account numbers | Full redaction |
Role-Based Visibility Matrix
Each user role has a visibility level that determines which classifications they can see unmasked:
| Role | Visibility Level | Sees Unmasked |
|---|---|---|
VIEWER | 0 | PUBLIC only |
ANALYST | 1 | PUBLIC, INTERNAL |
DATA_ENGINEER | 2 | PUBLIC, INTERNAL, CONFIDENTIAL |
DATA_STEWARD | 3 | PUBLIC, INTERNAL, CONFIDENTIAL, RESTRICTED |
ADMIN | 4 | All classifications |
If a user's visibility level is lower than the column's classification level, masking is applied.
Masking Strategies
The masking service supports multiple masking strategies, each appropriate for different data types:
Full Redaction
Replaces the entire value with a constant:
-- Original
SELECT ssn FROM customers
-- Masked (for ANALYST role)
SELECT '***-**-****' AS ssn FROM customersPartial Masking
Reveals a portion of the value while hiding the rest:
-- Email: show first 3 characters and domain
SELECT CONCAT(SUBSTR(email, 1, 3), '***@', SPLIT_PART(email, '@', 2)) AS email FROM customers
-- Phone: show last 4 digits
SELECT CONCAT('***-***-', SUBSTR(phone, -4)) AS phone FROM customers
-- Credit card: show last 4 digits
SELECT CONCAT('****-****-****-', SUBSTR(card_number, -4)) AS card_number FROM transactionsHashing
Replaces the value with a deterministic hash, preserving equality comparisons but hiding actual values:
-- SHA-256 hash preserves grouping and joining on masked columns
SELECT SHA256(CAST(customer_id AS VARCHAR)) AS customer_id_hash, SUM(amount) AS total
FROM orders GROUP BY SHA256(CAST(customer_id AS VARCHAR))Bucketing
Replaces numeric values with range buckets:
-- Salary: replace with range
SELECT CASE
WHEN salary < 50000 THEN '< 50K'
WHEN salary < 100000 THEN '50K-100K'
WHEN salary < 150000 THEN '100K-150K'
ELSE '150K+'
END AS salary_range FROM employeesDate Truncation
Reduces date precision:
-- Date of birth: show only year
SELECT DATE_TRUNC('year', date_of_birth) AS birth_year FROM customersNullification
Replaces the value with NULL:
-- Completely hide the column value
SELECT NULL AS medical_record_number FROM patientsMasking Rule Configuration
Masking rules are stored in the catalog service and associated with column metadata:
{
"table": "customers",
"column": "ssn",
"classification": "SECRET",
"maskingRules": [
{
"roleLevel": 3,
"strategy": "PARTIAL",
"config": {
"showLast": 4,
"maskChar": "*",
"format": "***-**-{last4}"
}
},
{
"roleLevel": 0,
"strategy": "REDACT",
"config": {
"replacement": "***-**-****"
}
}
]
}Rules are evaluated in order of decreasing role level. The first rule where the user's role level meets or exceeds the rule's roleLevel is applied.
Masking Audit Trail
Every masking operation is logged to the MaskingAuditLogRepository:
@Entity
public class MaskingAuditLog {
private UUID id;
private UUID tenantId;
private UUID userId;
private UUID executionId;
private String tableName;
private String columnName;
private String classification;
private String maskingStrategy;
private String userRole;
private Instant timestamp;
}The MaskingAuditController exposes endpoints for querying the audit trail:
GET /v1/queries/masking/audit?tenantId={tenantId}&table=customers&startDate=2026-02-01
Response:
{
"content": [
{
"id": "...",
"userId": "user-456",
"tableName": "customers",
"columnName": "ssn",
"classification": "SECRET",
"maskingStrategy": "PARTIAL",
"userRole": "DATA_STEWARD",
"timestamp": "2026-02-12T10:30:00Z"
}
],
"totalElements": 234
}Masking and Query Semantics
Aggregation Compatibility
When masking is applied to columns used in GROUP BY or aggregate functions, the masking service must ensure query semantics are preserved:
| Operation | Behavior with Masking |
|---|---|
GROUP BY masked_column | Uses hash masking to preserve grouping |
COUNT(DISTINCT masked_column) | Applies hash to preserve count accuracy |
SUM(masked_numeric_column) | Bucketing applied; sum is approximate |
WHERE masked_column = value | Filter evaluated before masking; masking is SELECT-only |
ORDER BY masked_column | Ordering preserved if hash masking; lost if redaction |
JOIN Key Handling
When a masked column is used as a JOIN key, the masking service applies consistent hashing on both sides of the join to preserve join correctness:
-- Both sides use the same hash function
SELECT a.*, b.order_total
FROM customers a
JOIN orders b ON SHA256(CAST(a.customer_id AS VARCHAR)) = SHA256(CAST(b.customer_id AS VARCHAR))Performance Considerations
| Concern | Impact | Mitigation |
|---|---|---|
| Masking function overhead | 5-15% execution time increase | Functions are simple string/hash operations |
| Cache partitioning | Different masking per role means different cache entries | Role hash included in cache key |
| Query plan complexity | Additional function calls in SELECT | Query plan cache stores post-masking plan |
Related Sections
- Row-Level Security -- Row-level filtering that complements data masking
- SQL Execution -- Masking in the execution pipeline
- Classification -- Data classification tag management
- Governance Service -- Policy management for masking rules