Discovery Search
Discovery search explores the knowledge graph outward from a starting entity to find related entities, uncover hidden connections, and map the neighborhood of a specific data asset. Unlike targeted semantic or structural search, discovery search is exploratory and designed for data catalog browsing.
Overview
Discovery search answers questions like "What else is related to this entity?" and "What does the neighborhood around this dataset look like?" It combines graph expansion with relevance scoring to surface the most interesting related entities.
How It Works
- Start from a seed entity (identified by URN)
- Expand outward through relationships up to a configurable depth
- Score each discovered entity based on relationship distance, entity importance, and access frequency
- Return the top-k most relevant discovered entities with their relationship paths
API Usage
results = await service.search(SearchQuery(
query="urn:matih:dataset:acme:customer_events",
tenant_id="acme",
mode=SearchMode.DISCOVERY,
top_k=20,
filters=SearchFilters(
max_hops=3,
),
))Discovery Scoring
Discovered entities are scored using a combination of factors:
| Factor | Description |
|---|---|
| Hop Distance | Closer entities score higher (inverse of distance) |
| Relationship Strength | Stronger relationships (e.g., DERIVED_FROM) score higher than weaker ones |
| Entity Importance | Entities with more connections score higher |
| Access Recency | Recently accessed entities get a recency boost |
| Type Relevance | Entities of types commonly associated with the seed entity score higher |
Relationship Path Tracking
Each discovery result includes the full path from the seed entity:
{
"entity_urn": "urn:matih:model:acme:churn_predictor",
"score": 0.85,
"path": [
{"from": "urn:matih:dataset:acme:customer_events", "rel": "CONSUMED_BY"},
{"from": "urn:matih:feature:acme:customer_features", "rel": "TRAINED_ON"},
{"from": "urn:matih:model:acme:churn_predictor", "rel": null}
],
"hop_distance": 2
}Filtering Options
| Filter | Description |
|---|---|
entity_types | Only discover entities of specified types |
relationship_types | Only follow specified relationship types |
max_hops | Maximum traversal depth (default: 3) |
exclude_urns | Exclude specific entities from results |
min_importance | Minimum entity importance score |
Use Cases
| Use Case | Starting Entity | What Gets Discovered |
|---|---|---|
| Data catalog browsing | A dataset | Related features, models, dashboards, and pipelines |
| Impact analysis | A schema change | All downstream consumers and dependents |
| Dependency mapping | A deployed model | All upstream data sources and feature stores |
| Knowledge exploration | A business metric | Related dimensions, facts, and computed measures |