MATIH Platform is in active MVP development. Documentation reflects current implementation status.
12. AI Service
ML Integration
Feature Engineering

Feature Engineering

The Feature Engineering integration provides access to the platform feature store through the AI Service, enabling users to create, manage, and serve feature sets for machine learning workflows. Features are defined declaratively and materialized from SQL transformations against the data warehouse.


Feature Store Architecture

The feature store is powered by Feast and integrated into the ML pipeline:

Data Warehouse --> Feature Transformations --> Feature Store (Feast) --> ML Training / Serving
                                                      ^
                                                      |
                                              AI Service (Feature API)

Feature Set Definition

Feature sets are defined with SQL transformations and metadata:

{
  "name": "customer_features",
  "entity": "customer_id",
  "features": [
    {
      "name": "tenure_months",
      "type": "int64",
      "description": "Number of months as a customer",
      "sql": "DATEDIFF(month, signup_date, CURRENT_DATE)"
    },
    {
      "name": "total_spend",
      "type": "float64",
      "description": "Total lifetime spend",
      "sql": "SUM(amount) FROM orders WHERE customer_id = entity.customer_id"
    },
    {
      "name": "order_frequency",
      "type": "float64",
      "description": "Average orders per month",
      "sql": "COUNT(*) / DATEDIFF(month, MIN(order_date), CURRENT_DATE)"
    }
  ],
  "timestamp_field": "event_timestamp",
  "ttl_days": 90
}

Feature API

Create Feature Set

POST /api/v1/ml/features/sets

List Feature Sets

GET /api/v1/ml/features/sets?tenant_id=acme-corp

Get Features Online

Retrieves the latest feature values for real-time inference:

POST /api/v1/ml/features/online
{
  "feature_set": "customer_features",
  "entity_ids": ["cust-001", "cust-002"],
  "features": ["tenure_months", "total_spend", "order_frequency"]
}

Get Features Offline

Retrieves historical feature values for training with point-in-time correctness:

POST /api/v1/ml/features/offline
{
  "feature_set": "customer_features",
  "entity_ids": ["cust-001", "cust-002"],
  "features": ["tenure_months", "total_spend"],
  "timestamp": "2025-01-01T00:00:00Z"
}

Feature Transformations

Transformation TypeDescriptionExample
AggregationStatistical aggregates over a windowAVG(amount) OVER 30 days
Time-basedTemporal calculationsDATEDIFF(month, signup_date, NOW())
Categorical encodingOne-hot or label encodingcontract_type to numeric
NormalizationMin-max or z-score scaling(value - min) / (max - min)
WindowedRolling window computationsSUM(amount) OVER last 7 days
Cross-featureInteractions between featuresrevenue / num_employees

Materialization

Features are materialized on a schedule or on demand:

ModeTriggerLatency
BatchScheduled (hourly, daily)Minutes
StreamingKafka event triggerSeconds
On-demandAPI requestSeconds to minutes

Configuration

Environment VariableDefaultDescription
FEAST_FEATURE_STORE_URLhttp://feast:6566Feast serving URL
FEAST_REGISTRY_PATH/data/feast/registry.dbRegistry database path
FEATURE_CACHE_TTL300Online feature cache TTL in seconds
FEATURE_MAX_ENTITIES1000Max entities per online request