Thinking Embeddings

The ThinkingEmbeddingService generates vector embeddings for agent thinking traces, enabling similarity search across past reasoning patterns. Three types of embeddings are generated per trace: input embeddings (user query), output embeddings (agent response), and thinking embeddings (concatenated reasoning steps).

Overview

Thinking embeddings enable powerful analytical queries like "find traces where the agent reasoned similarly" and "show me past successful approaches to this type of question." They are stored in the Pinecone vector store within tenant-scoped namespaces.

Source: data-plane/ai-service/src/context_graph/embeddings/thinking_embeddings.py

Embedding Types

Type	Input Text	Use Case
Input Embedding	User query / goal text	Find similar user questions
Output Embedding	Agent response / final output	Find traces with similar results
Thinking Embedding	Concatenated reasoning steps	Find traces with similar reasoning patterns

Generation Process

Extract the relevant text from the thinking trace
Generate an embedding vector using the configured embedding model
Store the vector in Pinecone with trace metadata
Update the trace record with the embedding IDs

service = ThinkingEmbeddingService(
    embedding_store=store,
    embedding_model=model,
)
 
embeddings = await service.generate_embeddings(trace)
# ThinkingEmbeddings(
#     trace_id="trace-123",
#     input_embedding_id="emb-input-123",
#     output_embedding_id="emb-output-123",
#     thinking_embedding_id="emb-thinking-123",
# )

Similarity Search

Find Similar Traces

similar = await service.find_similar_traces(
    query_text="Show me revenue by product category",
    tenant_id="acme",
    top_k=10,
    embedding_type="input",  # Search by similar questions
)

Returns a list of SimilarTrace objects:

Field	Type	Description
`trace_id`	string	ID of the similar trace
`similarity_score`	float	Cosine similarity (0-1)
`goal`	string	The goal of the similar trace
`path_taken`	list	Steps taken in the similar trace
`outcome`	string	Outcome of the similar trace

Metadata Storage

Each embedding vector is stored with metadata for filtered search:

Metadata Field	Description
`trace_id`	Source trace identifier
`tenant_id`	Tenant scope
`session_id`	Session identifier
`actor_urn`	Agent or user URN
`embedding_type`	`input`, `output`, or `thinking`
`outcome`	Trace outcome for filtering
`model_ids`	Models used in the trace

Feature Flag Control

Thinking embedding generation is controlled by the context_graph_embeddings feature flag. When disabled, traces are still captured but embeddings are not generated:

from context_graph.services.semantic_feature_flags import (
    SemanticFeatureFlagService,
    SemanticFeature,
)
 
flag_service = get_feature_flag_service()
is_enabled = flag_service.is_enabled(
    SemanticFeature.CONTEXT_GRAPH_EMBEDDINGS,
    tenant_id="acme",
)

Performance

Operation	Typical Latency	Notes
Input embedding generation	100-200ms	Single short text
Output embedding generation	100-200ms	Single short text
Thinking embedding generation	200-500ms	Concatenated reasoning (longer text)
Vector upsert to Pinecone	50-100ms	Single vector with metadata
Similarity search	30-80ms	Top-k ANN search

Thinking Traces Reasoning Analytics