MATIH Platform is in active MVP development. Documentation reflects current implementation status.
14. Context Graph & Ontology
Storage Backends
Hybrid Store (GraphRAG)

Hybrid Store (GraphRAG)

The HybridStore combines the ContextGraphStore (graph database) with the VectorStore (Pinecone) to implement advanced retrieval-augmented generation with graph context. It supports multiple retrieval strategies based on the Microsoft GraphRAG architecture, including local search, global search, drift search, and hybrid scoring.


Overview

The Hybrid Store bridges structural knowledge (entity relationships, lineage paths) with semantic knowledge (embedding similarity) to provide richer context for LLM-augmented queries.

Source: data-plane/ai-service/src/context_graph/storage/hybrid_store.py


Retrieval Strategies

StrategyDescriptionWhen to Use
LOCALVector similarity + 1-hop graph neighborsQuick entity lookup with local context
GLOBALCommunity detection + summary embeddingsBroad topic exploration across the graph
DRIFTTemporal trajectory patternsDetecting behavioral changes over time
HYBRIDWeighted combination of vector and graph scoresDefault for most queries
GRAPH_FIRSTGraph traversal then vector rerankingWhen structural relationships matter most
VECTOR_FIRSTVector search then graph expansionWhen semantic similarity matters most

Search Modes

ModeDescription
ENTITY_SEMANTICFind similar entities by description embedding
ENTITY_STRUCTURALFind structurally similar entities by graph position
DECISION_PRECEDENTFind precedent decisions by rationale embedding
LINEAGE_AWAREVector search with lineage context expansion
IMPACT_ANALYSISFind entities affected by a change

Configuration

from context_graph.storage.hybrid_store import HybridStoreConfig
 
config = HybridStoreConfig(
    vector_weight=0.6,           # Weight for vector similarity (0-1)
    graph_weight=0.4,            # Weight for graph relevance (0-1)
    default_hop_distance=2,      # Default hops for graph expansion
    max_hop_distance=5,          # Maximum hops for graph expansion
    min_combined_score=0.3,      # Minimum combined score threshold
    enable_community_detection=False,
    enable_temporal_decay=True,
    temporal_decay_days=30,      # Half-life for temporal decay
)

Hybrid Search

The primary search method combines vector and graph scoring:

response = await hybrid_store.search(
    query_vector=embedding,
    tenant_id="acme",
    strategy=RetrievalStrategy.HYBRID,
    top_k=10,
    hop_distance=2,
    entity_types=[EntityType.DATASET],
    min_score=0.3,
    include_graph_context=True,
)
 
for result in response.results:
    print(f"{result.entity_urn}: {result.combined_score:.3f}")

Scoring Formula

The combined score is calculated as:

combined_score = vector_weight * vector_score + graph_weight * graph_score

When temporal decay is enabled, a decay factor is applied:

decay = 0.5 ^ (age_days / temporal_decay_days)
final_score = combined_score * (0.7 + 0.3 * decay)

GraphRAG Context Assembly

For LLM augmentation, the store assembles a complete GraphRAGContext:

context = await hybrid_store.get_graphrag_context(
    query="Show me total sales by region",
    query_vector=embedding,
    tenant_id="acme",
    top_k=5,
    hop_distance=2,
    include_decisions=True,
    include_lineage=True,
)

The returned context includes:

  • Primary matching entities with scores
  • Extended graph context (neighbors, relationships)
  • Relevant past decisions (precedents)
  • Lineage paths for data provenance

Entity Operations

Upsert Entity Embedding

success = await hybrid_store.upsert_entity_embedding(
    entity_urn="urn:matih:dataset:acme:sales_data",
    embedding=[0.1, 0.2, ...],
    tenant_id="acme",
    namespace=EmbeddingNamespace.ENTITY_SEMANTIC,
    entity_type="dataset",
)

Find Similar Entities

similar = await hybrid_store.find_similar_entities(
    entity_urn="urn:matih:dataset:acme:sales_data",
    tenant_id="acme",
    top_k=10,
    include_structural=True,
)

Combines semantic and structural similarity with a 60/40 weighting.

Find Decision Precedents

precedents = await hybrid_store.find_decision_precedents(
    decision_urn="urn:matih:decision:acme:deploy-model-v2",
    tenant_id="acme",
    top_k=5,
    min_relevance=0.5,
)

Initialization

from context_graph.storage.hybrid_store import create_hybrid_store
 
store = await create_hybrid_store(
    graph_store=graph_store,    # Optional, uses singleton
    vector_store=vector_store,  # Optional, uses mock if None
    config=config,              # Optional HybridStoreConfig
)