MATIH Platform is in active MVP development. Documentation reflects current implementation status.
14. Context Graph & Ontology
Agent Thinking
Thinking Embeddings

Thinking Embeddings

The ThinkingEmbeddingService generates vector embeddings for agent thinking traces, enabling similarity search across past reasoning patterns. Three types of embeddings are generated per trace: input embeddings (user query), output embeddings (agent response), and thinking embeddings (concatenated reasoning steps).


Overview

Thinking embeddings enable powerful analytical queries like "find traces where the agent reasoned similarly" and "show me past successful approaches to this type of question." They are stored in the Pinecone vector store within tenant-scoped namespaces.

Source: data-plane/ai-service/src/context_graph/embeddings/thinking_embeddings.py


Embedding Types

TypeInput TextUse Case
Input EmbeddingUser query / goal textFind similar user questions
Output EmbeddingAgent response / final outputFind traces with similar results
Thinking EmbeddingConcatenated reasoning stepsFind traces with similar reasoning patterns

Generation Process

  1. Extract the relevant text from the thinking trace
  2. Generate an embedding vector using the configured embedding model
  3. Store the vector in Pinecone with trace metadata
  4. Update the trace record with the embedding IDs
service = ThinkingEmbeddingService(
    embedding_store=store,
    embedding_model=model,
)
 
embeddings = await service.generate_embeddings(trace)
# ThinkingEmbeddings(
#     trace_id="trace-123",
#     input_embedding_id="emb-input-123",
#     output_embedding_id="emb-output-123",
#     thinking_embedding_id="emb-thinking-123",
# )

Similarity Search

Find Similar Traces

similar = await service.find_similar_traces(
    query_text="Show me revenue by product category",
    tenant_id="acme",
    top_k=10,
    embedding_type="input",  # Search by similar questions
)

Returns a list of SimilarTrace objects:

FieldTypeDescription
trace_idstringID of the similar trace
similarity_scorefloatCosine similarity (0-1)
goalstringThe goal of the similar trace
path_takenlistSteps taken in the similar trace
outcomestringOutcome of the similar trace

Metadata Storage

Each embedding vector is stored with metadata for filtered search:

Metadata FieldDescription
trace_idSource trace identifier
tenant_idTenant scope
session_idSession identifier
actor_urnAgent or user URN
embedding_typeinput, output, or thinking
outcomeTrace outcome for filtering
model_idsModels used in the trace

Feature Flag Control

Thinking embedding generation is controlled by the context_graph_embeddings feature flag. When disabled, traces are still captured but embeddings are not generated:

from context_graph.services.semantic_feature_flags import (
    SemanticFeatureFlagService,
    SemanticFeature,
)
 
flag_service = get_feature_flag_service()
is_enabled = flag_service.is_enabled(
    SemanticFeature.CONTEXT_GRAPH_EMBEDDINGS,
    tenant_id="acme",
)

Performance

OperationTypical LatencyNotes
Input embedding generation100-200msSingle short text
Output embedding generation100-200msSingle short text
Thinking embedding generation200-500msConcatenated reasoning (longer text)
Vector upsert to Pinecone50-100msSingle vector with metadata
Similarity search30-80msTop-k ANN search