Feature Engineering
The Feature Engineering integration provides access to the platform feature store through the AI Service, enabling users to create, manage, and serve feature sets for machine learning workflows. Features are defined declaratively and materialized from SQL transformations against the data warehouse.
Feature Store Architecture
The feature store is powered by Feast and integrated into the ML pipeline:
Data Warehouse --> Feature Transformations --> Feature Store (Feast) --> ML Training / Serving
^
|
AI Service (Feature API)Feature Set Definition
Feature sets are defined with SQL transformations and metadata:
{
"name": "customer_features",
"entity": "customer_id",
"features": [
{
"name": "tenure_months",
"type": "int64",
"description": "Number of months as a customer",
"sql": "DATEDIFF(month, signup_date, CURRENT_DATE)"
},
{
"name": "total_spend",
"type": "float64",
"description": "Total lifetime spend",
"sql": "SUM(amount) FROM orders WHERE customer_id = entity.customer_id"
},
{
"name": "order_frequency",
"type": "float64",
"description": "Average orders per month",
"sql": "COUNT(*) / DATEDIFF(month, MIN(order_date), CURRENT_DATE)"
}
],
"timestamp_field": "event_timestamp",
"ttl_days": 90
}Feature API
Create Feature Set
POST /api/v1/ml/features/setsList Feature Sets
GET /api/v1/ml/features/sets?tenant_id=acme-corpGet Features Online
Retrieves the latest feature values for real-time inference:
POST /api/v1/ml/features/online{
"feature_set": "customer_features",
"entity_ids": ["cust-001", "cust-002"],
"features": ["tenure_months", "total_spend", "order_frequency"]
}Get Features Offline
Retrieves historical feature values for training with point-in-time correctness:
POST /api/v1/ml/features/offline{
"feature_set": "customer_features",
"entity_ids": ["cust-001", "cust-002"],
"features": ["tenure_months", "total_spend"],
"timestamp": "2025-01-01T00:00:00Z"
}Feature Transformations
| Transformation Type | Description | Example |
|---|---|---|
| Aggregation | Statistical aggregates over a window | AVG(amount) OVER 30 days |
| Time-based | Temporal calculations | DATEDIFF(month, signup_date, NOW()) |
| Categorical encoding | One-hot or label encoding | contract_type to numeric |
| Normalization | Min-max or z-score scaling | (value - min) / (max - min) |
| Windowed | Rolling window computations | SUM(amount) OVER last 7 days |
| Cross-feature | Interactions between features | revenue / num_employees |
Materialization
Features are materialized on a schedule or on demand:
| Mode | Trigger | Latency |
|---|---|---|
| Batch | Scheduled (hourly, daily) | Minutes |
| Streaming | Kafka event trigger | Seconds |
| On-demand | API request | Seconds to minutes |
Configuration
| Environment Variable | Default | Description |
|---|---|---|
FEAST_FEATURE_STORE_URL | http://feast:6566 | Feast serving URL |
FEAST_REGISTRY_PATH | /data/feast/registry.db | Registry database path |
FEATURE_CACHE_TTL | 300 | Online feature cache TTL in seconds |
FEATURE_MAX_ENTITIES | 1000 | Max entities per online request |