Agent Memory Manager

Production - Short-term buffer, long-term persistent, episodic recall, hybrid memory

The Agent Memory Manager provides multi-tier memory for agent conversations. It supports short-term conversation buffers, long-term persistent storage, episodic recall of past interactions, and hybrid strategies that combine all three. Defined in data-plane/ai-service/src/agents/memory_stores.py.

12.2.5.1Memory Architecture

+----------------------------------------------------------+
|                    HybridMemory                           |
|  +------------------+  +----------------+  +----------+  |
|  | Short-Term       |  | Long-Term      |  | Episodic |  |
|  | (Buffer)         |  | (Persistent)   |  | (Vector) |  |
|  |                  |  |                |  |          |  |
|  | Recent N msgs    |  | PostgreSQL     |  | Qdrant   |  |
|  | Context window   |  | Summarization  |  | Semantic |  |
|  | Fast access      |  | Full history   |  | Recall   |  |
|  +------------------+  +----------------+  +----------+  |
+----------------------------------------------------------+

Memory Types

Type	Storage	Capacity	Access Pattern
Short-Term	In-memory buffer	Last N messages (configurable)	Fast sequential access
Long-Term	PostgreSQL	Unlimited with summarization	Query by session/tenant
Episodic	Qdrant vectors	Semantically indexed memories	Similarity search
Hybrid	All three	Combines all tiers	Adaptive retrieval

12.2.5.2MemoryManager

class MemoryManager:
    """Manages per-session memory instances."""
 
    async def get_memory(self, session_id: str) -> HybridMemory:
        """Get or create memory for a session."""
        ...
 
    async def save_memory(self, session_id: str) -> None:
        """Persist memory state to long-term storage."""
        ...
 
    async def summarize_if_needed(self, session_id: str) -> None:
        """Summarize conversation if buffer exceeds threshold."""
        ...

HybridMemory

class HybridMemory:
    """Multi-tier memory combining short-term, long-term, and episodic."""
 
    def add(self, message: AgentMessage) -> None:
        """Add a message to the buffer."""
        self._buffer.append(message)
        if len(self._buffer) > self._max_buffer_size:
            self._buffer.pop(0)
 
    def get_messages(self, limit: int = 20) -> list[AgentMessage]:
        """Get recent messages from buffer."""
        return self._buffer[-limit:]
 
    def get_context(self) -> str:
        """Build context string from all memory tiers."""
        parts = []
        # Long-term summary
        if self._summary:
            parts.append(f"[Previous Context Summary]\n{self._summary}")
        # Episodic recall
        if self._episodic_memories:
            parts.append(f"[Relevant Past Interactions]\n{self._format_episodes()}")
        return "\n\n".join(parts)

12.2.5.3Configuration

Memory behavior is controlled through settings:

# src/config/settings.py
memory_retention_days: int = 90       # History retention period
memory_summary_threshold: int = 50    # Messages before auto-summarization
memory_max_context_messages: int = 20 # Max messages injected into LLM context
 
# Session settings
session_store_type: str = "hybrid"    # memory | redis | hybrid
session_ttl_hours: int = 24           # Session time-to-live
session_max_ttl_hours: int = 168      # Maximum TTL (7 days)
session_max_messages_before_summary: int = 30

Memory Summarization

When a conversation exceeds the summary threshold, the memory manager uses the LLM to generate a concise summary:

async def summarize_if_needed(self, session_id: str):
    memory = await self.get_memory(session_id)
    if len(memory._buffer) >= self._summary_threshold:
        messages_text = "\n".join(m.content for m in memory._buffer[:-10])
        summary = await self._llm.chat([
            {"role": "system", "content": "Summarize this conversation concisely."},
            {"role": "user", "content": messages_text}
        ])
        memory._summary = summary["content"]
        memory._buffer = memory._buffer[-10:]  # Keep last 10 messages

Agent Pools & Scaling Agent Studio