Embedding Features
The embedding pipeline enables vector-based feature management, allowing ML models to leverage semantic similarity search over feature data. This is particularly useful for recommendation systems, content matching, and anomaly detection.
Configuration
from src.features.unified_feature_store import EmbeddingConfig
config = EmbeddingConfig(
model_type="openai", # openai, sentence_transformers, custom
model_name="text-embedding-ada-002",
dimension=1536,
distance_metric="cosine", # cosine, euclidean, dot_product
batch_size=32,
)Creating Embeddings
Single Record
vector = await store.embed_single(
tenant_id="acme-corp",
feature_view="product_embeddings",
entity_id="prod-123",
text="Wireless noise-canceling headphones with 30-hour battery life",
metadata={"category": "electronics", "price": 299.99},
)Batch Embedding
records = [
{"product_id": "p1", "description": "Wireless headphones", "category": "electronics"},
{"product_id": "p2", "description": "Running shoes", "category": "sports"},
{"product_id": "p3", "description": "Coffee maker", "category": "kitchen"},
]
count = await store.create_embedding_feature(
tenant_id="acme-corp",
feature_view="product_embeddings",
records=records,
entity_id_field="product_id",
text_field="description",
)Similarity Search
results = await store.similarity_search(
tenant_id="acme-corp",
feature_view="product_embeddings",
query="noise canceling earbuds",
top_k=5,
filters={"category": "electronics"},
)
# Returns:
# [
# {"entity_id": "p1", "score": 0.92, "metadata": {"category": "electronics", ...}},
# {"entity_id": "p5", "score": 0.87, "metadata": {...}},
# ]Distance Metrics
| Metric | Formula | Use Case |
|---|---|---|
cosine | 1 - cos(a, b) | Text similarity, normalized vectors |
euclidean | 1 / (1 + dist) | Spatial data, continuous features |
dot_product | a . b | When magnitude matters |
Source Files
| File | Path |
|---|---|
| Embedding Feature Service | data-plane/ml-service/src/features/embedding_feature_service.py |
| UnifiedFeatureStore | data-plane/ml-service/src/features/unified_feature_store.py |