MATIH Platform is in active MVP development. Documentation reflects current implementation status.
13. ML Service & MLOps
Offline Store (Iceberg)

Offline Store (Iceberg)

The offline store uses Apache Iceberg tables for historical feature storage, providing time travel queries, schema evolution, and point-in-time correct feature retrieval for training datasets.


Iceberg Configuration

class IcebergOfflineStore(OfflineStoreBackend):
    def __init__(
        self,
        catalog_name: str = "default",
        catalog_type: str = "hive",
        warehouse_location: str = "s3://bucket/warehouse",
        hive_metastore_uri: Optional[str] = None,
    ): ...

Table Creation

When a feature view is registered with offline_enabled=True, an Iceberg table is automatically created:

table_id = await offline_store.create_table(
    tenant_id="acme-corp",
    feature_view="customer_features",
    schema=[
        FeatureField(name="total_purchases", dtype="float64"),
        FeatureField(name="avg_order_value", dtype="float64"),
    ],
    entity_keys=[EntityKey(name="customer_id", dtype="string")],
    partition_by=["event_timestamp"],
)

Tables are created in a tenant-specific database: features_{tenant_id}.{feature_view}.


Type Mapping

MATIH TypeIceberg Type
int64long
int32int
float64double
float32float
stringstring
boolboolean
timestamptimestamp
bytesbinary

Historical Feature Retrieval

Point-in-time correct retrieval finds the most recent feature value before a given timestamp:

results = await offline_store.read_historical(
    table_id="...",
    entity_keys=[{"customer_id": "cust-123"}],
    timestamps=[datetime(2026, 1, 15)],
    feature_names=["total_purchases", "avg_order_value"],
)

Time Travel Queries

Iceberg's snapshot-based architecture enables querying the table as of any historical point:

historical_data = await offline_store.time_travel_query(
    table_id="...",
    as_of=datetime(2026, 1, 1),
    filters={"is_premium": True},
)

Point-in-Time Joins for Training

The PointInTimeJoinEngine retrieves training features with temporal correctness:

result = await store.get_training_features(
    tenant_id="acme-corp",
    entity_df=[
        {"customer_id": "c1", "event_timestamp": "2026-01-15T10:00:00"},
        {"customer_id": "c2", "event_timestamp": "2026-01-15T11:00:00"},
    ],
    feature_refs=["customer_features:total_purchases", "customer_features:avg_order_value"],
    event_timestamp_column="event_timestamp",
)
# result.records: [{customer_id, event_timestamp, customer_features:total_purchases, ...}]
# result.features_found: 4
# result.features_missing: 0

Source Files

FilePath
Iceberg Offline Storedata-plane/ml-service/src/features/iceberg_offline_store.py
Feast Offline Storedata-plane/ml-service/src/features/feast_offline_store.py
UnifiedFeatureStoredata-plane/ml-service/src/features/unified_feature_store.py