MATIH Platform is in active MVP development. Documentation reflects current implementation status.
9. Query Engine & SQL
Caching
Semantic Cache

Semantic Cache

The SemanticCacheService enables cache hits for queries that are structurally identical but differ in literal values, whitespace, or formatting. This dramatically improves cache hit rates for parameterized dashboard queries and repeated analytical patterns.


How It Works

Traditional caching uses an exact SHA-256 hash of the SQL text. Two queries that differ only in a WHERE clause value produce different hashes and cannot share cached results. The semantic cache addresses this by normalizing queries to extract structural templates:

Original:  SELECT name FROM users WHERE id = 42
Normalized: SELECT name FROM users WHERE id = ?NUM?
Template:   SELECT name FROM users WHERE id = ?

Original:  SELECT name FROM users WHERE id = 99
Normalized: SELECT name FROM users WHERE id = ?NUM?
Template:   SELECT name FROM users WHERE id = ?

Both queries produce the same template hash, enabling a semantic cache hit.


Normalization Process

The normalizeQuery() method applies several transformations:

public String normalizeQuery(String query) {
    String normalized = query.trim();
 
    // Normalize whitespace
    normalized = normalized.replaceAll("\\s+", " ");
 
    // Normalize SQL keywords to uppercase
    normalized = normalizeKeywords(normalized);
 
    // Remove comments
    normalized = normalized.replaceAll("--[^\n]*", "");
    normalized = normalized.replaceAll("/\\*.*?\\*/", "");
 
    // Replace string literals with placeholders
    normalized = STRING_LITERAL.matcher(normalized).replaceAll("?STRING?");
 
    // Replace number literals with placeholders
    normalized = NUMBER_LITERAL.matcher(normalized).replaceAll("?NUM?");
 
    // Replace date literals with placeholders
    normalized = DATE_LITERAL.matcher(normalized).replaceAll("?DATE?");
 
    return normalized.trim();
}

Normalization Steps

StepBeforeAfter
WhitespaceSELECT name FROM usersSELECT name FROM users
Keywordsselect name from usersSELECT name FROM users
CommentsSELECT name -- get namesSELECT name
StringsWHERE city = 'New York'WHERE city = ?STRING?
NumbersWHERE id = 42WHERE id = ?NUM?
DatesWHERE date = '2026-01-15'WHERE date = ?DATE?

Template Hashing

After normalization, a template hash is computed by removing the placeholder types:

public String computeTemplateHash(String normalizedQuery) {
    String template = normalizedQuery
            .replaceAll("\\?STRING\\?", "?")
            .replaceAll("\\?NUM\\?", "?")
            .replaceAll("\\?DATE\\?", "?");
    return Integer.toHexString(template.hashCode());
}

Parameter Extraction

The service extracts and classifies parameters from the original SQL:

public List<QueryParameter> extractParameters(String query) {
    List<QueryParameter> params = new ArrayList<>();
    // Extract string literals
    Matcher stringMatcher = STRING_LITERAL.matcher(query);
    while (stringMatcher.find()) {
        params.add(QueryParameter.builder()
                .type(ParameterType.STRING)
                .value(stringMatcher.group(1))
                .position(stringMatcher.start())
                .build());
    }
    // Extract numbers, dates, LIMIT, OFFSET...
}

Parameter Types

TypePatternExample
STRING'...''New York'
NUMBER\d+(\.\d+)?42, 3.14
DATE\d{4}-\d{2}-\d{2}...2026-01-15
LIMITLIMIT \d+LIMIT 100
OFFSETOFFSET \d+OFFSET 50

Fuzzy Matching

When exact template matching fails, the semantic cache can attempt fuzzy matching using Jaccard similarity:

private double calculateSimilarity(String query1, String query2) {
    Set<String> tokens1 = new HashSet<>(Arrays.asList(query1.split("\\s+")));
    Set<String> tokens2 = new HashSet<>(Arrays.asList(query2.split("\\s+")));
 
    Set<String> intersection = new HashSet<>(tokens1);
    intersection.retainAll(tokens2);
 
    Set<String> union = new HashSet<>(tokens1);
    union.addAll(tokens2);
 
    return union.isEmpty() ? 0.0 : (double) intersection.size() / union.size();
}

Fuzzy matching requires a minimum similarity score of 90% and is used as a fallback when exact template matching misses.


API: Normalize Query (Debug)

# Normalize a query for debugging
curl -X POST http://query-engine:8080/v1/cache/semantic/normalize \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $JWT_TOKEN" \
  -d '{"query": "SELECT name FROM users WHERE id = 42 AND city = '\''NYC'\''"}'
{
  "original": "SELECT name FROM users WHERE id = 42 AND city = 'NYC'",
  "normalized": "SELECT name FROM users WHERE id = ?NUM? AND city = ?STRING?",
  "templateHash": "a3f2b9c1"
}

API: Popular Templates

# Get the 20 most popular query templates
curl http://query-engine:8080/v1/cache/semantic/templates?limit=20 \
  -H "Authorization: Bearer $JWT_TOKEN"
[
  {
    "templateHash": "a3f2b9c1",
    "sampleQuery": "SELECT name FROM users WHERE id = ?NUM?",
    "firstSeen": "2026-02-01T00:00:00Z",
    "lastSeen": "2026-02-12T10:00:00Z",
    "hitCount": 1542
  }
]

Semantic Cache Statistics

curl http://query-engine:8080/v1/cache/semantic/stats \
  -H "Authorization: Bearer $JWT_TOKEN"
{
  "enabled": true,
  "totalEntries": 2450,
  "templateCount": 180,
  "uniqueTemplates": 180
}

Metrics

MetricTypeDescription
query.cache.semantic.hitCounterExact template match hits
query.cache.semantic.fuzzy_hitCounterFuzzy match hits (>=90% similarity)
query.cache.semantic.missCounterNo semantic match found
query.cache.semantic.indexedCounterQueries indexed for future matching
query.cache.semantic.cleanupCounterExpired entries removed
query.cache.semantic.errorCounterLookup errors
query.cache.semantic.lookupTimerSemantic lookup latency