Vector Stores (Pinecone/Qdrant)
The Context Graph uses vector stores for semantic similarity search across entity embeddings, decision rationale vectors, agent trajectory embeddings, and query patterns. The primary production implementation uses Pinecone with a VectorStoreBase abstraction that also supports Qdrant and a mock store for testing.
Architecture
All vector store implementations extend VectorStoreBase, which defines a standard interface for vector operations, tenant namespace isolation, and lifecycle management.
Source: data-plane/ai-service/src/context_graph/storage/pinecone_vector_store.py
Embedding Namespaces
Each embedding type maps to a dedicated index with its own dimensionality:
| Namespace | Index Suffix | Dimensions | Metric | Description |
|---|---|---|---|---|
ENTITY_SEMANTIC | entity-semantic | 1536 | cosine | Entity description embeddings |
ENTITY_STRUCTURAL | entity-structural | 128 | cosine | Graph structure embeddings (node2vec) |
DECISION_RATIONALE | decision-rationale | 1536 | cosine | Decision rationale embeddings |
DECISION_CONTEXT | decision-rationale | 1536 | cosine | Decision context embeddings |
TRAJECTORY | trajectory | 256 | cosine | Agent trajectory embeddings |
QUERY_PATTERN | query-pattern | 1536 | cosine | User query pattern embeddings |
GRAPH_COMMUNITY | graph-community | 256 | cosine | Graph community summary embeddings |
Tenant Namespace Isolation
Each tenant gets isolated namespaces within indexes. The format is:
{tenant_id}:{embedding_namespace}For example, tenant acme querying entity semantic embeddings uses namespace acme:entity_semantic.
Pinecone Implementation
Initialization
from context_graph.storage.pinecone_vector_store import create_pinecone_store
store = await create_pinecone_store(
api_key=None, # Falls back to PINECONE_API_KEY env var
environment=None, # Falls back to PINECONE_ENVIRONMENT env var
index_prefix=None, # Falls back to PINECONE_INDEX_PREFIX env var
)During initialization, the store creates any missing indexes using Pinecone serverless mode.
Upsert Vectors
count = await store.upsert(
vectors=[VectorRecord(id="vec-1", values=[...], metadata=metadata)],
namespace=EmbeddingNamespace.ENTITY_SEMANTIC,
tenant_id="acme",
batch_size=100,
)Query for Similar Vectors
response = await store.query(
query_vector=[0.1, 0.2, ...],
namespace=EmbeddingNamespace.ENTITY_SEMANTIC,
tenant_id="acme",
top_k=10,
min_score=0.5,
filter={"entity_type": {"$in": ["dataset", "model"]}},
)Fetch by ID
records = await store.fetch(
ids=["vec-1", "vec-2"],
namespace=EmbeddingNamespace.ENTITY_SEMANTIC,
tenant_id="acme",
)Delete Vectors
count = await store.delete(
ids=["vec-1"],
namespace=EmbeddingNamespace.ENTITY_SEMANTIC,
tenant_id="acme",
)Configuration
| Variable | Description | Default |
|---|---|---|
PINECONE_API_KEY | Pinecone API key (required for production) | -- |
PINECONE_ENVIRONMENT | Pinecone cloud region | us-east-1-aws |
PINECONE_INDEX_PREFIX | Prefix for all index names | matih-context |
Graceful Degradation
When the Pinecone API key is not configured or the pinecone-client package is not installed, the store falls back to mock mode:
- All write operations return success without persisting
- All read operations return empty results
- A warning is logged at startup indicating mock mode
This enables local development without a Pinecone account.
Statistics
stats = await store.get_stats(tenant_id="acme")
# VectorStoreStats(
# provider="pinecone",
# total_vectors=15234,
# index_count=6,
# by_namespace={"entity_semantic": 10000, "decision_rationale": 5234},
# )VectorStoreBase Interface
Custom vector store implementations must extend VectorStoreBase and implement:
| Method | Description |
|---|---|
connect() | Establish connection to the backend |
disconnect() | Close the connection |
initialize() | Set up indexes and configuration |
upsert() | Batch upsert vectors |
query() | Similarity search |
fetch() | Retrieve vectors by ID |
delete() | Delete vectors by ID |
get_stats() | Retrieve store statistics |