Memory Retrieval
The MemoryRetrievalService provides efficient memory recall for agents by combining recency (temporal decay), relevance (semantic similarity via vector embeddings), and entity proximity (graph distance to context entities). It implements the Zep-style episode-mention reranking pattern for improved retrieval quality.
Overview
When an agent needs to recall relevant memories for a task, the retrieval service selects the most useful memories from the temporal memory store using a multi-factor scoring approach. This avoids both information overload (too many memories) and information gaps (missing critical context).
Source: data-plane/ai-service/src/context_graph/services/memory_retrieval_service.py
Retrieval Modes
| Mode | Description | Best For |
|---|---|---|
RECENCY | Prioritize recent memories via temporal decay | Continuation of recent conversation |
RELEVANCE | Prioritize semantic relevance via embeddings | Topic-specific recall |
ENTITY_FOCUSED | Prioritize entity-linked memories | Entity-specific context |
BALANCED | Balance all factors equally | General-purpose retrieval |
EPISODE | Episode-based retrieval (Zep pattern) | Conversational memory |
Recency Decay Functions
| Function | Formula | Behavior |
|---|---|---|
EXPONENTIAL | exp(-t/tau) | Fast decay, strong recency bias |
LINEAR | max(0, 1 - t/max_age) | Steady linear decay |
LOGARITHMIC | 1 / (1 + log(1 + t)) | Slow decay, retains older memories |
STEP | 1 if t under threshold else 0 | Binary cutoff at threshold |
Scoring Formula
The final retrieval score combines three factors:
score = recency_weight * recency_score
+ relevance_weight * semantic_similarity
+ proximity_weight * entity_proximity_scoreDefault weights: recency=0.3, relevance=0.5, proximity=0.2
Episode-Based Retrieval
The Zep-style episode pattern groups memories into episodes (typically one per session) and retrieves:
- The most relevant episodes based on semantic similarity
- All memories within those episodes, maintaining temporal order
- Re-ranked by mention frequency of key entities
This preserves conversational coherence that pure semantic search would lose.
API Usage
service = get_memory_retrieval_service()
memories = await service.retrieve(
query="What do I know about the customer churn model?",
tenant_id="acme",
session_id="sess-123",
mode=RetrievalMode.BALANCED,
max_memories=20,
entity_urns=["urn:matih:model:acme:churn_predictor"],
)Integration Points
| Component | Purpose |
|---|---|
TemporalMemoryStore | Source of memory facts |
HybridStore | Vector + graph search for relevance |
PineconeVectorStore | Semantic similarity search |
ContextGraphStore | Entity proximity via graph distance |
Performance
| Operation | Typical Latency | Notes |
|---|---|---|
| Recency-only retrieval | 5-10ms | Direct database query |
| Semantic retrieval | 50-100ms | Includes embedding generation and vector search |
| Entity-focused retrieval | 20-50ms | Graph distance computation |
| Balanced retrieval | 80-150ms | Combines all three factors |
| Episode retrieval | 100-200ms | Episode identification + member retrieval |