MATIH Platform is in active MVP development. Documentation reflects current implementation status.
13. ML Service & MLOps
Embedding Features

Embedding Features

The embedding pipeline enables vector-based feature management, allowing ML models to leverage semantic similarity search over feature data. This is particularly useful for recommendation systems, content matching, and anomaly detection.


Configuration

from src.features.unified_feature_store import EmbeddingConfig
 
config = EmbeddingConfig(
    model_type="openai",           # openai, sentence_transformers, custom
    model_name="text-embedding-ada-002",
    dimension=1536,
    distance_metric="cosine",      # cosine, euclidean, dot_product
    batch_size=32,
)

Creating Embeddings

Single Record

vector = await store.embed_single(
    tenant_id="acme-corp",
    feature_view="product_embeddings",
    entity_id="prod-123",
    text="Wireless noise-canceling headphones with 30-hour battery life",
    metadata={"category": "electronics", "price": 299.99},
)

Batch Embedding

records = [
    {"product_id": "p1", "description": "Wireless headphones", "category": "electronics"},
    {"product_id": "p2", "description": "Running shoes", "category": "sports"},
    {"product_id": "p3", "description": "Coffee maker", "category": "kitchen"},
]
 
count = await store.create_embedding_feature(
    tenant_id="acme-corp",
    feature_view="product_embeddings",
    records=records,
    entity_id_field="product_id",
    text_field="description",
)

Similarity Search

results = await store.similarity_search(
    tenant_id="acme-corp",
    feature_view="product_embeddings",
    query="noise canceling earbuds",
    top_k=5,
    filters={"category": "electronics"},
)
 
# Returns:
# [
#   {"entity_id": "p1", "score": 0.92, "metadata": {"category": "electronics", ...}},
#   {"entity_id": "p5", "score": 0.87, "metadata": {...}},
# ]

Distance Metrics

MetricFormulaUse Case
cosine1 - cos(a, b)Text similarity, normalized vectors
euclidean1 / (1 + dist)Spatial data, continuous features
dot_producta . bWhen magnitude matters

Source Files

FilePath
Embedding Feature Servicedata-plane/ml-service/src/features/embedding_feature_service.py
UnifiedFeatureStoredata-plane/ml-service/src/features/unified_feature_store.py