Thinking Embeddings
The ThinkingEmbeddingService generates vector embeddings for agent thinking traces, enabling similarity search across past reasoning patterns. Three types of embeddings are generated per trace: input embeddings (user query), output embeddings (agent response), and thinking embeddings (concatenated reasoning steps).
Overview
Thinking embeddings enable powerful analytical queries like "find traces where the agent reasoned similarly" and "show me past successful approaches to this type of question." They are stored in the Pinecone vector store within tenant-scoped namespaces.
Source: data-plane/ai-service/src/context_graph/embeddings/thinking_embeddings.py
Embedding Types
| Type | Input Text | Use Case |
|---|---|---|
| Input Embedding | User query / goal text | Find similar user questions |
| Output Embedding | Agent response / final output | Find traces with similar results |
| Thinking Embedding | Concatenated reasoning steps | Find traces with similar reasoning patterns |
Generation Process
- Extract the relevant text from the thinking trace
- Generate an embedding vector using the configured embedding model
- Store the vector in Pinecone with trace metadata
- Update the trace record with the embedding IDs
service = ThinkingEmbeddingService(
embedding_store=store,
embedding_model=model,
)
embeddings = await service.generate_embeddings(trace)
# ThinkingEmbeddings(
# trace_id="trace-123",
# input_embedding_id="emb-input-123",
# output_embedding_id="emb-output-123",
# thinking_embedding_id="emb-thinking-123",
# )Similarity Search
Find Similar Traces
similar = await service.find_similar_traces(
query_text="Show me revenue by product category",
tenant_id="acme",
top_k=10,
embedding_type="input", # Search by similar questions
)Returns a list of SimilarTrace objects:
| Field | Type | Description |
|---|---|---|
trace_id | string | ID of the similar trace |
similarity_score | float | Cosine similarity (0-1) |
goal | string | The goal of the similar trace |
path_taken | list | Steps taken in the similar trace |
outcome | string | Outcome of the similar trace |
Metadata Storage
Each embedding vector is stored with metadata for filtered search:
| Metadata Field | Description |
|---|---|
trace_id | Source trace identifier |
tenant_id | Tenant scope |
session_id | Session identifier |
actor_urn | Agent or user URN |
embedding_type | input, output, or thinking |
outcome | Trace outcome for filtering |
model_ids | Models used in the trace |
Feature Flag Control
Thinking embedding generation is controlled by the context_graph_embeddings feature flag. When disabled, traces are still captured but embeddings are not generated:
from context_graph.services.semantic_feature_flags import (
SemanticFeatureFlagService,
SemanticFeature,
)
flag_service = get_feature_flag_service()
is_enabled = flag_service.is_enabled(
SemanticFeature.CONTEXT_GRAPH_EMBEDDINGS,
tenant_id="acme",
)Performance
| Operation | Typical Latency | Notes |
|---|---|---|
| Input embedding generation | 100-200ms | Single short text |
| Output embedding generation | 100-200ms | Single short text |
| Thinking embedding generation | 200-500ms | Concatenated reasoning (longer text) |
| Vector upsert to Pinecone | 50-100ms | Single vector with metadata |
| Similarity search | 30-80ms | Top-k ANN search |