Multi-Level Cache

The MultiLevelCacheService implements a two-tier caching strategy: L1 (in-memory via Caffeine) for sub-millisecond local access, and L2 (Redis) for distributed durability across Query Engine replicas.

L1 Cache: Caffeine In-Memory

The L1 cache uses Ben Manes' Caffeine library, configured with weight-based eviction and TTL expiration:

l1Cache = Caffeine.newBuilder()
        .maximumWeight(cacheConfig.getL1().getMaxSizeMb() * 1024L * 1024L)
        .weigher((Weigher<String, CacheEntry>) (key, entry) ->
                (int) Math.min(entry.getSizeBytes(), Integer.MAX_VALUE))
        .expireAfterWrite(cacheConfig.getL1().getTtl())
        .expireAfterAccess(cacheConfig.getL1().getExpireAfterAccess())
        .recordStats()
        .removalListener((key, entry, cause) -> {
            if (cause == RemovalCause.SIZE || cause == RemovalCause.EXPIRED) {
                meterRegistry.counter("query.cache.l1.eviction",
                        "cause", cause.name()).increment();
            }
        })
        .build();

L1 Configuration Defaults

Property	Default	Description
`query.cache.l1.enabled`	`true`	Enable L1 cache
`query.cache.l1.max-entries`	1000	Maximum cache entries
`query.cache.l1.max-size-mb`	256	Maximum total size in MB
`query.cache.l1.ttl`	10 minutes	Time-to-live after write
`query.cache.l1.expire-after-access`	5 minutes	Idle expiration

L2 Cache: Redis with Compression

The L2 cache stores serialized CacheEntry objects in Redis with configurable TTL and GZIP compression for large entries:

public void put(UUID tenantId, String queryHash, QueryResponse response,
                Set<String> dependencies, String originalQuery) {
    String cacheKey = buildKey(tenantId, queryHash);
    long sizeBytes = estimateSize(response);
 
    // Skip if too large
    if (sizeBytes > cacheConfig.getL2().getMaxEntrySizeMb() * 1024L * 1024L) {
        meterRegistry.counter("query.cache.skip", "reason", "too_large").increment();
        return;
    }
 
    CacheEntry entry = CacheEntry.builder()
            .response(response)
            .tenantId(tenantId)
            .queryHash(queryHash)
            .createdAt(Instant.now())
            .sizeBytes(sizeBytes)
            .dependencies(dependencies)
            .build();
 
    // Put in L1
    if (cacheConfig.getL1().isEnabled()) {
        l1Cache.put(cacheKey, entry);
    }
 
    // Put in L2 with TTL
    if (cacheConfig.getL2().isEnabled()) {
        String serialized = serialize(entry);
        redisTemplate.opsForValue().set(cacheKey, serialized, cacheConfig.getL2().getTtl());
    }
}

GZIP Compression

Entries larger than the compression threshold (default: 100 KB) are compressed with GZIP before storage in Redis. The prefix GZIP: indicates compressed data:

private String serialize(CacheEntry entry) throws JsonProcessingException {
    String json = objectMapper.writeValueAsString(entry);
    if (cacheConfig.getL2().isCompressionEnabled() &&
            json.length() > cacheConfig.getL2().getCompressionThresholdKb() * 1024) {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        try (GZIPOutputStream gzip = new GZIPOutputStream(baos)) {
            gzip.write(json.getBytes("UTF-8"));
        }
        return "GZIP:" + Base64.getEncoder().encodeToString(baos.toByteArray());
    }
    return json;
}

L2 Configuration Defaults

Property	Default	Description
`query.cache.l2.enabled`	`true`	Enable L2 cache
`query.cache.l2.key-prefix`	`query:cache:`	Redis key prefix
`query.cache.l2.ttl`	1 hour	Time-to-live in Redis
`query.cache.l2.max-entry-size-mb`	50	Maximum single entry size
`query.cache.l2.compression-threshold-kb`	100	Compress entries above this size
`query.cache.l2.compression-enabled`	`true`	Enable GZIP compression

Cache Lookup Flow

The get() method implements the read-through pattern with L1-to-L2 promotion:

public Optional<QueryResponse> get(UUID tenantId, String queryHash) {
    String cacheKey = buildKey(tenantId, queryHash);
 
    // Try L1 first
    if (cacheConfig.getL1().isEnabled()) {
        CacheEntry l1Entry = l1Cache.getIfPresent(cacheKey);
        if (l1Entry != null && !l1Entry.isExpired()) {
            l1Entry.recordHit();
            return Optional.of(l1Entry.getResponse());
        }
    }
 
    // Try L2
    if (cacheConfig.getL2().isEnabled()) {
        String serialized = redisTemplate.opsForValue().get(cacheKey);
        if (serialized != null) {
            CacheEntry l2Entry = deserialize(serialized);
            if (l2Entry != null && !l2Entry.isExpired()) {
                // Promote to L1
                if (cacheConfig.getL1().isEnabled()) {
                    l1Cache.put(cacheKey, l2Entry);
                }
                return Optional.of(l2Entry.getResponse());
            }
        }
    }
 
    return Optional.empty();
}

The key insight is L2-to-L1 promotion: when a cache hit occurs at L2 but not L1, the entry is copied into L1 for faster subsequent access.

Cache Key Structure

Cache keys follow the pattern: {prefix}{tenantId}:{queryHash}

query:cache:550e8400-e29b-41d4-a716-446655440000:a3f2b9c1d4e5f6...

Dependency tracking keys: {prefix}dep:{tenantId}:{dependency}

query:cache:dep:550e8400-e29b-41d4-a716-446655440000:orders

Priority Calculation

Cache entries are assigned a priority score (0-100) that influences eviction and warming decisions:

private int calculatePriority(QueryResponse response) {
    int priority = 50;
    if (response.getExecutionTimeMs() > 10000) priority -= 20;
    else if (response.getExecutionTimeMs() < 1000) priority += 20;
    if (response.getRowCount() > 10000) priority -= 10;
    else if (response.getRowCount() < 100) priority += 10;
    return Math.max(0, Math.min(100, priority));
}

Small, fast queries receive higher priority in the cache, while large, slow queries receive lower priority (but still benefit from caching due to their high computation cost).

Overview Semantic Cache