Agent Memory Manager
Production - Short-term buffer, long-term persistent, episodic recall, hybrid memory
The Agent Memory Manager provides multi-tier memory for agent conversations. It supports short-term conversation buffers, long-term persistent storage, episodic recall of past interactions, and hybrid strategies that combine all three. Defined in data-plane/ai-service/src/agents/memory_stores.py.
12.2.5.1Memory Architecture
+----------------------------------------------------------+
| HybridMemory |
| +------------------+ +----------------+ +----------+ |
| | Short-Term | | Long-Term | | Episodic | |
| | (Buffer) | | (Persistent) | | (Vector) | |
| | | | | | | |
| | Recent N msgs | | PostgreSQL | | Qdrant | |
| | Context window | | Summarization | | Semantic | |
| | Fast access | | Full history | | Recall | |
| +------------------+ +----------------+ +----------+ |
+----------------------------------------------------------+Memory Types
| Type | Storage | Capacity | Access Pattern |
|---|---|---|---|
| Short-Term | In-memory buffer | Last N messages (configurable) | Fast sequential access |
| Long-Term | PostgreSQL | Unlimited with summarization | Query by session/tenant |
| Episodic | Qdrant vectors | Semantically indexed memories | Similarity search |
| Hybrid | All three | Combines all tiers | Adaptive retrieval |
12.2.5.2MemoryManager
class MemoryManager:
"""Manages per-session memory instances."""
async def get_memory(self, session_id: str) -> HybridMemory:
"""Get or create memory for a session."""
...
async def save_memory(self, session_id: str) -> None:
"""Persist memory state to long-term storage."""
...
async def summarize_if_needed(self, session_id: str) -> None:
"""Summarize conversation if buffer exceeds threshold."""
...HybridMemory
class HybridMemory:
"""Multi-tier memory combining short-term, long-term, and episodic."""
def add(self, message: AgentMessage) -> None:
"""Add a message to the buffer."""
self._buffer.append(message)
if len(self._buffer) > self._max_buffer_size:
self._buffer.pop(0)
def get_messages(self, limit: int = 20) -> list[AgentMessage]:
"""Get recent messages from buffer."""
return self._buffer[-limit:]
def get_context(self) -> str:
"""Build context string from all memory tiers."""
parts = []
# Long-term summary
if self._summary:
parts.append(f"[Previous Context Summary]\n{self._summary}")
# Episodic recall
if self._episodic_memories:
parts.append(f"[Relevant Past Interactions]\n{self._format_episodes()}")
return "\n\n".join(parts)12.2.5.3Configuration
Memory behavior is controlled through settings:
# src/config/settings.py
memory_retention_days: int = 90 # History retention period
memory_summary_threshold: int = 50 # Messages before auto-summarization
memory_max_context_messages: int = 20 # Max messages injected into LLM context
# Session settings
session_store_type: str = "hybrid" # memory | redis | hybrid
session_ttl_hours: int = 24 # Session time-to-live
session_max_ttl_hours: int = 168 # Maximum TTL (7 days)
session_max_messages_before_summary: int = 30Memory Summarization
When a conversation exceeds the summary threshold, the memory manager uses the LLM to generate a concise summary:
async def summarize_if_needed(self, session_id: str):
memory = await self.get_memory(session_id)
if len(memory._buffer) >= self._summary_threshold:
messages_text = "\n".join(m.content for m in memory._buffer[:-10])
summary = await self._llm.chat([
{"role": "system", "content": "Summarize this conversation concisely."},
{"role": "user", "content": messages_text}
])
memory._summary = summary["content"]
memory._buffer = memory._buffer[-10:] # Keep last 10 messages