MATIH Platform is in active MVP development. Documentation reflects current implementation status.
14. Context Graph & Ontology
Memory System
Memory Retrieval

Memory Retrieval

The MemoryRetrievalService provides efficient memory recall for agents by combining recency (temporal decay), relevance (semantic similarity via vector embeddings), and entity proximity (graph distance to context entities). It implements the Zep-style episode-mention reranking pattern for improved retrieval quality.


Overview

When an agent needs to recall relevant memories for a task, the retrieval service selects the most useful memories from the temporal memory store using a multi-factor scoring approach. This avoids both information overload (too many memories) and information gaps (missing critical context).

Source: data-plane/ai-service/src/context_graph/services/memory_retrieval_service.py


Retrieval Modes

ModeDescriptionBest For
RECENCYPrioritize recent memories via temporal decayContinuation of recent conversation
RELEVANCEPrioritize semantic relevance via embeddingsTopic-specific recall
ENTITY_FOCUSEDPrioritize entity-linked memoriesEntity-specific context
BALANCEDBalance all factors equallyGeneral-purpose retrieval
EPISODEEpisode-based retrieval (Zep pattern)Conversational memory

Recency Decay Functions

FunctionFormulaBehavior
EXPONENTIALexp(-t/tau)Fast decay, strong recency bias
LINEARmax(0, 1 - t/max_age)Steady linear decay
LOGARITHMIC1 / (1 + log(1 + t))Slow decay, retains older memories
STEP1 if t under threshold else 0Binary cutoff at threshold

Scoring Formula

The final retrieval score combines three factors:

score = recency_weight * recency_score
      + relevance_weight * semantic_similarity
      + proximity_weight * entity_proximity_score

Default weights: recency=0.3, relevance=0.5, proximity=0.2


Episode-Based Retrieval

The Zep-style episode pattern groups memories into episodes (typically one per session) and retrieves:

  1. The most relevant episodes based on semantic similarity
  2. All memories within those episodes, maintaining temporal order
  3. Re-ranked by mention frequency of key entities

This preserves conversational coherence that pure semantic search would lose.


API Usage

service = get_memory_retrieval_service()
 
memories = await service.retrieve(
    query="What do I know about the customer churn model?",
    tenant_id="acme",
    session_id="sess-123",
    mode=RetrievalMode.BALANCED,
    max_memories=20,
    entity_urns=["urn:matih:model:acme:churn_predictor"],
)

Integration Points

ComponentPurpose
TemporalMemoryStoreSource of memory facts
HybridStoreVector + graph search for relevance
PineconeVectorStoreSemantic similarity search
ContextGraphStoreEntity proximity via graph distance

Performance

OperationTypical LatencyNotes
Recency-only retrieval5-10msDirect database query
Semantic retrieval50-100msIncludes embedding generation and vector search
Entity-focused retrieval20-50msGraph distance computation
Balanced retrieval80-150msCombines all three factors
Episode retrieval100-200msEpisode identification + member retrieval