Hybrid Store (GraphRAG)
The HybridStore combines the ContextGraphStore (graph database) with the VectorStore (Pinecone) to implement advanced retrieval-augmented generation with graph context. It supports multiple retrieval strategies based on the Microsoft GraphRAG architecture, including local search, global search, drift search, and hybrid scoring.
Overview
The Hybrid Store bridges structural knowledge (entity relationships, lineage paths) with semantic knowledge (embedding similarity) to provide richer context for LLM-augmented queries.
Source: data-plane/ai-service/src/context_graph/storage/hybrid_store.py
Retrieval Strategies
| Strategy | Description | When to Use |
|---|---|---|
LOCAL | Vector similarity + 1-hop graph neighbors | Quick entity lookup with local context |
GLOBAL | Community detection + summary embeddings | Broad topic exploration across the graph |
DRIFT | Temporal trajectory patterns | Detecting behavioral changes over time |
HYBRID | Weighted combination of vector and graph scores | Default for most queries |
GRAPH_FIRST | Graph traversal then vector reranking | When structural relationships matter most |
VECTOR_FIRST | Vector search then graph expansion | When semantic similarity matters most |
Search Modes
| Mode | Description |
|---|---|
ENTITY_SEMANTIC | Find similar entities by description embedding |
ENTITY_STRUCTURAL | Find structurally similar entities by graph position |
DECISION_PRECEDENT | Find precedent decisions by rationale embedding |
LINEAGE_AWARE | Vector search with lineage context expansion |
IMPACT_ANALYSIS | Find entities affected by a change |
Configuration
from context_graph.storage.hybrid_store import HybridStoreConfig
config = HybridStoreConfig(
vector_weight=0.6, # Weight for vector similarity (0-1)
graph_weight=0.4, # Weight for graph relevance (0-1)
default_hop_distance=2, # Default hops for graph expansion
max_hop_distance=5, # Maximum hops for graph expansion
min_combined_score=0.3, # Minimum combined score threshold
enable_community_detection=False,
enable_temporal_decay=True,
temporal_decay_days=30, # Half-life for temporal decay
)Hybrid Search
The primary search method combines vector and graph scoring:
response = await hybrid_store.search(
query_vector=embedding,
tenant_id="acme",
strategy=RetrievalStrategy.HYBRID,
top_k=10,
hop_distance=2,
entity_types=[EntityType.DATASET],
min_score=0.3,
include_graph_context=True,
)
for result in response.results:
print(f"{result.entity_urn}: {result.combined_score:.3f}")Scoring Formula
The combined score is calculated as:
combined_score = vector_weight * vector_score + graph_weight * graph_scoreWhen temporal decay is enabled, a decay factor is applied:
decay = 0.5 ^ (age_days / temporal_decay_days)
final_score = combined_score * (0.7 + 0.3 * decay)GraphRAG Context Assembly
For LLM augmentation, the store assembles a complete GraphRAGContext:
context = await hybrid_store.get_graphrag_context(
query="Show me total sales by region",
query_vector=embedding,
tenant_id="acme",
top_k=5,
hop_distance=2,
include_decisions=True,
include_lineage=True,
)The returned context includes:
- Primary matching entities with scores
- Extended graph context (neighbors, relationships)
- Relevant past decisions (precedents)
- Lineage paths for data provenance
Entity Operations
Upsert Entity Embedding
success = await hybrid_store.upsert_entity_embedding(
entity_urn="urn:matih:dataset:acme:sales_data",
embedding=[0.1, 0.2, ...],
tenant_id="acme",
namespace=EmbeddingNamespace.ENTITY_SEMANTIC,
entity_type="dataset",
)Find Similar Entities
similar = await hybrid_store.find_similar_entities(
entity_urn="urn:matih:dataset:acme:sales_data",
tenant_id="acme",
top_k=10,
include_structural=True,
)Combines semantic and structural similarity with a 60/40 weighting.
Find Decision Precedents
precedents = await hybrid_store.find_decision_precedents(
decision_urn="urn:matih:decision:acme:deploy-model-v2",
tenant_id="acme",
top_k=5,
min_relevance=0.5,
)Initialization
from context_graph.storage.hybrid_store import create_hybrid_store
store = await create_hybrid_store(
graph_store=graph_store, # Optional, uses singleton
vector_store=vector_store, # Optional, uses mock if None
config=config, # Optional HybridStoreConfig
)