Hybrid Store (GraphRAG)

The HybridStore combines the ContextGraphStore (graph database) with the VectorStore (Pinecone) to implement advanced retrieval-augmented generation with graph context. It supports multiple retrieval strategies based on the Microsoft GraphRAG architecture, including local search, global search, drift search, and hybrid scoring.

Overview

The Hybrid Store bridges structural knowledge (entity relationships, lineage paths) with semantic knowledge (embedding similarity) to provide richer context for LLM-augmented queries.

Source: data-plane/ai-service/src/context_graph/storage/hybrid_store.py

Retrieval Strategies

Strategy	Description	When to Use
`LOCAL`	Vector similarity + 1-hop graph neighbors	Quick entity lookup with local context
`GLOBAL`	Community detection + summary embeddings	Broad topic exploration across the graph
`DRIFT`	Temporal trajectory patterns	Detecting behavioral changes over time
`HYBRID`	Weighted combination of vector and graph scores	Default for most queries
`GRAPH_FIRST`	Graph traversal then vector reranking	When structural relationships matter most
`VECTOR_FIRST`	Vector search then graph expansion	When semantic similarity matters most

Search Modes

Mode	Description
`ENTITY_SEMANTIC`	Find similar entities by description embedding
`ENTITY_STRUCTURAL`	Find structurally similar entities by graph position
`DECISION_PRECEDENT`	Find precedent decisions by rationale embedding
`LINEAGE_AWARE`	Vector search with lineage context expansion
`IMPACT_ANALYSIS`	Find entities affected by a change

Configuration

from context_graph.storage.hybrid_store import HybridStoreConfig
 
config = HybridStoreConfig(
    vector_weight=0.6,           # Weight for vector similarity (0-1)
    graph_weight=0.4,            # Weight for graph relevance (0-1)
    default_hop_distance=2,      # Default hops for graph expansion
    max_hop_distance=5,          # Maximum hops for graph expansion
    min_combined_score=0.3,      # Minimum combined score threshold
    enable_community_detection=False,
    enable_temporal_decay=True,
    temporal_decay_days=30,      # Half-life for temporal decay
)

Hybrid Search

The primary search method combines vector and graph scoring:

response = await hybrid_store.search(
    query_vector=embedding,
    tenant_id="acme",
    strategy=RetrievalStrategy.HYBRID,
    top_k=10,
    hop_distance=2,
    entity_types=[EntityType.DATASET],
    min_score=0.3,
    include_graph_context=True,
)
 
for result in response.results:
    print(f"{result.entity_urn}: {result.combined_score:.3f}")

Scoring Formula

The combined score is calculated as:

combined_score = vector_weight * vector_score + graph_weight * graph_score

When temporal decay is enabled, a decay factor is applied:

decay = 0.5 ^ (age_days / temporal_decay_days)
final_score = combined_score * (0.7 + 0.3 * decay)

GraphRAG Context Assembly

For LLM augmentation, the store assembles a complete GraphRAGContext:

context = await hybrid_store.get_graphrag_context(
    query="Show me total sales by region",
    query_vector=embedding,
    tenant_id="acme",
    top_k=5,
    hop_distance=2,
    include_decisions=True,
    include_lineage=True,
)

The returned context includes:

Primary matching entities with scores
Extended graph context (neighbors, relationships)
Relevant past decisions (precedents)
Lineage paths for data provenance

Entity Operations

Upsert Entity Embedding

success = await hybrid_store.upsert_entity_embedding(
    entity_urn="urn:matih:dataset:acme:sales_data",
    embedding=[0.1, 0.2, ...],
    tenant_id="acme",
    namespace=EmbeddingNamespace.ENTITY_SEMANTIC,
    entity_type="dataset",
)

Find Similar Entities

similar = await hybrid_store.find_similar_entities(
    entity_urn="urn:matih:dataset:acme:sales_data",
    tenant_id="acme",
    top_k=10,
    include_structural=True,
)

Combines semantic and structural similarity with a 60/40 weighting.

Find Decision Precedents

precedents = await hybrid_store.find_decision_precedents(
    decision_urn="urn:matih:decision:acme:deploy-model-v2",
    tenant_id="acme",
    top_k=5,
    min_relevance=0.5,
)

Initialization

from context_graph.storage.hybrid_store import create_hybrid_store
 
store = await create_hybrid_store(
    graph_store=graph_store,    # Optional, uses singleton
    vector_store=vector_store,  # Optional, uses mock if None
    config=config,              # Optional HybridStoreConfig
)

Dgraph Context Store Vector Stores (Pinecone/Qdrant)