Healthcare & Life Sciences
End-to-end walkthroughs showing how a regional health system uses the MATIH Platform to unify clinical data, predict patient outcomes, optimize operations, and maintain strict HIPAA compliance across every data interaction.
Industry Context
Healthcare organizations sit on some of the richest and most sensitive data in any industry. A single patient encounter generates records across electronic health records (EHR), laboratory information systems, pharmacy dispensing, billing and claims, imaging archives, and patient-reported outcomes. The challenge is not volume alone -- it is fragmentation across dozens of siloed clinical and administrative systems, each with its own data model, access controls, and regulatory obligations.
Most health system analytics teams operate in reactive mode: pulling CSV extracts from the EHR, manually reconciling claims data, and building one-off reports for each regulatory submission. Data scientists struggle to access de-identified datasets for research. Clinicians lack real-time operational visibility. Executives receive monthly reports that are already stale. The MATIH Platform consolidates these workflows into a single governed environment where clinical researchers, operational leaders, and executive decision-makers work from the same trusted, HIPAA-compliant data layer.
Company Profile: Pinnacle Health System
All walkthroughs in this section follow employees at Pinnacle Health System, a fictional regional health system with the following profile:
| Attribute | Value |
|---|---|
| Hospitals | 12 acute care facilities |
| Annual Patient Volume | 200,000 unique patients/year |
| Beds | 3,200 licensed beds across system |
| Annual Revenue | $4.2B |
| Employed Physicians | 1,800 |
| Payer Mix | 42% Medicare, 28% Commercial, 18% Medicaid, 12% Self-pay |
| EHR System | Epic (7 hospitals), Cerner (5 hospitals) |
| Data Team | 6 data scientists, 3 ML engineers, 5 BI analysts, CMO, CMIO |
Sample Datasets
These are the core datasets used across all four walkthroughs. In a production deployment, these tables live in their respective source systems and are ingested into the platform via Airbyte connectors, FHIR APIs, or file imports.
| Dataset | Source | Rows | Description |
|---|---|---|---|
patients | EHR (FHIR) | 200K | Demographics -- patient_id, mrn, birth_date, gender, race, ethnicity, zip_code, insurance_type |
encounters | EHR (FHIR) | 2M | Admissions, ED visits, outpatient -- encounter_id, patient_id, admit_date, discharge_date, facility_id, discharge_disposition |
lab_results | LIS | 5M | Lab values -- result_id, encounter_id, loinc_code, test_name, result_value, result_units, reference_range, collected_at |
prescriptions | EHR (FHIR) | 1.5M | Medications -- rx_id, patient_id, ndc_code, drug_name, dosage, frequency, prescriber_id, start_date, end_date |
claims | Claims DB | 3M | Billing -- claim_id, encounter_id, payer_id, drg_code, billed_amount, allowed_amount, paid_amount, denial_code |
clinical_trials | CTMS | 500 | Active trials -- trial_id, nct_number, title, phase, therapeutic_area, pi_name, status, target_enrollment |
trial_enrollments | CTMS | 50K | Enrollment records -- enrollment_id, trial_id, patient_id, consent_date, status, site_id |
imaging_metadata | PACS | 800K | Radiology metadata -- study_id, patient_id, modality, body_part, study_date, reading_physician, findings_summary |
Data Sources
Pinnacle Health's data lives across clinical, administrative, and research systems. The platform connects to all of them through the Ingestion Service (Airbyte connectors), FHIR APIs, and file imports.
| Source | Type | Connector | Sync Mode | Frequency |
|---|---|---|---|---|
| Epic EHR (7 hospitals) | FHIR R4 API | Airbyte FHIR Connector | Incremental (lastUpdated) | Every 15 min |
| Cerner EHR (5 hospitals) | FHIR R4 API | Airbyte FHIR Connector | Incremental (lastUpdated) | Every 15 min |
| Claims Clearinghouse | PostgreSQL | Airbyte PostgreSQL CDC | Incremental (WAL) | Hourly |
| Lab Information System | HL7v2 / FHIR | Airbyte FHIR Connector | Incremental | Every 30 min |
| CTMS (Clinical Trial Mgmt) | PostgreSQL | Airbyte PostgreSQL | Incremental (timestamp) | Daily |
| RedCap Surveys | CSV Export | File Import (Data Workbench) | One-time / on-demand | Weekly |
| ClinicalTrials.gov | REST API | Airbyte HTTP API | Full refresh | Weekly |
Data Flow Architecture
Pinnacle Health System Data Flow
┌──────────────────────────────────────────────────────────────────────┐
│ CLINICAL DATA SOURCES │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │Epic EHR │ │Cerner EHR│ │ Claims │ │ LIS │ │ CTMS │ │
│ │(FHIR R4) │ │(FHIR R4) │ │(Postgres)│ │(HL7/FHIR)│ │(Postgres)│ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
└───────┼────────────┼────────────┼────────────┼────────────┼────────┘
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
┌──────────────────────────────────────────────────────────────────────┐
│ INGESTION SERVICE (Airbyte + FHIR Connectors) │
│ De-identification | FHIR-to-relational | Schema mapping │
└────────────────────────────────┬─────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────────┐
│ HIPAA-COMPLIANT PLATFORM DATA LAYER │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────────────────┐ │
│ │ Catalog │ │ Query │ │ Quality │ │ Governance │ │
│ │ Service │ │ Engine │ │ Service │ │ (masking, audit, │ │
│ │ │ │ (Trino) │ │ (GX) │ │ HIPAA policies) │ │
│ └──────────┘ └──────────┘ └──────────┘ └────────────────────┘ │
└────────────────────────────────┬─────────────────────────────────────┘
│
┌────────────────────────┼────────────────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────────┐
│ ML Workbench │ │ BI Workbench │ │ Agentic Workbench│
│ │ │ │ │ │
│ Readmission │ │ Operations │ │ NL clinical │
│ models, │ │ Command │ │ queries, │
│ Trial │ │ Center, │ │ Evidence-based │
│ matching │ │ Quality │ │ decision │
│ │ │ dashboards │ │ support │
└──────────────┘ └──────────────┘ └──────────────────┘Compliance Framework
Healthcare data is governed by strict federal and state regulations. The MATIH Platform enforces compliance at every layer -- from ingestion to visualization.
| Regulation | Scope | Platform Enforcement |
|---|---|---|
| HIPAA Privacy Rule | Protected Health Information (PHI) -- 18 identifiers | Column-level masking, role-based access, minimum necessary enforcement |
| HIPAA Security Rule | Electronic PHI (ePHI) safeguards | Encryption at rest and in transit, audit logging, access controls |
| HITECH Act | Breach notification, EHR meaningful use | Automated audit trails, data access reporting |
| FDA 21 CFR Part 11 | Electronic records in clinical trials | Immutable audit trails, electronic signatures, data integrity validation |
| CMS Conditions of Participation | Quality reporting, readmission penalties | Automated metric computation matching CMS methodology |
HIPAA PHI Identifiers -- Governance Rules
The Governance Service automatically detects and enforces masking on all 18 HIPAA identifiers. The following governance policy is applied at the platform level:
{
"policy_name": "hipaa_phi_masking",
"policy_type": "column_masking",
"description": "Mask all 18 HIPAA identifiers for non-privileged roles",
"rules": [
{
"identifier": "patient_name",
"columns": ["first_name", "last_name", "full_name"],
"mask_type": "hash",
"allowed_roles": ["treating_physician", "hipaa_officer", "data_steward"]
},
{
"identifier": "date_of_birth",
"columns": ["birth_date", "dob"],
"mask_type": "generalize_year",
"allowed_roles": ["treating_physician", "clinical_researcher"]
},
{
"identifier": "ssn",
"columns": ["social_security_number", "ssn"],
"mask_type": "redact",
"allowed_roles": ["hipaa_officer"]
},
{
"identifier": "mrn",
"columns": ["medical_record_number", "mrn"],
"mask_type": "tokenize",
"allowed_roles": ["treating_physician", "clinical_researcher"]
},
{
"identifier": "geographic",
"columns": ["street_address", "zip_code"],
"mask_type": "generalize_zip3",
"allowed_roles": ["hipaa_officer"]
}
],
"audit": {
"log_all_access": true,
"retention_days": 2190,
"alert_on_bulk_access": true,
"bulk_threshold": 500
}
}Business KPIs
Pinnacle Health tracks these key performance indicators across all workbenches and dashboards. Each walkthrough shows how the platform computes, monitors, and acts on these metrics.
| KPI | Definition | Current | Target | CMS Benchmark |
|---|---|---|---|---|
| 30-Day Readmission Rate | % of discharges readmitted within 30 days | 14.2% | < 12.0% | 15.5% national avg |
| Average Length of Stay (ALOS) | Mean inpatient days per admission | 4.8 days | 4.2 days | 4.5 days |
| Patient Satisfaction (HCAHPS) | Hospital Consumer Assessment scores | 72/100 | 80/100 | 71/100 national avg |
| Clinical Trial Enrollment Rate | Eligible patients enrolled / eligible identified | 8.3% | 15.0% | 5-10% industry avg |
| Claims Denial Rate | Denied claims / total claims submitted | 11.4% | < 8.0% | 10% industry avg |
| Bed Utilization Rate | Occupied bed-days / available bed-days | 78% | 82-88% | Industry optimal |
| ED Boarding Time | Time from ED disposition to inpatient bed | 4.2 hours | < 2 hours | -- |
| Mortality Index (O/E) | Observed / Expected mortality ratio | 1.04 | < 1.00 | 1.00 expected |
| OR Utilization | Scheduled OR minutes used / available | 68% | 75-85% | 70-80% benchmark |
| CMS Star Rating | Overall hospital quality rating | 3.2 stars | 4.0 stars | 3.0 median |
Persona Walkthroughs
Each walkthrough follows one persona through all eight lifecycle stages, using real Pinnacle Health data and scenarios. Start with the role closest to yours, or read all four to see how the platform enables cross-functional collaboration.
| Walkthrough | Persona | Scenario | Primary Workbenches |
|---|---|---|---|
| Data Scientist Journey | Dr. Maya Chen, Clinical Data Scientist | Predicting 30-day hospital readmissions to reduce CMS penalties | ML Workbench, Data Workbench |
| ML Engineer Journey | Jordan Park, ML Engineer | Building a clinical trial patient matching engine at scale | ML Workbench, Pipeline Service |
| BI Lead Journey | Aisha Williams, BI Lead | Creating a hospital operations command center for 12 facilities | BI Workbench, Semantic Layer |
| Executive Leadership Journey | Dr. Robert Kim, CMO | Using AI-assisted analysis for clinical quality strategy and CMS performance | Agentic Workbench, BI Dashboards |
How the Walkthroughs Connect
These four personas work on the same data at Pinnacle Health. Their work products feed into each other:
Dr. Maya Chen (Data Scientist) Jordan Park (ML Engineer)
┌────────────────────────┐ ┌────────────────────────┐
│ Readmission risk │ │ Clinical trial patient │
│ model (C-stat 0.72) │───────────▶│ matching engine │
│ │ model │ │
│ Feature engineering, │ registry │ Ray Serve deployment, │
│ cohort analysis │ │ EHR integration │
└──────────┬─────────────┘ └──────────┬─────────────┘
│ risk scores │ match alerts
▼ ▼
┌────────────────────────┐ ┌────────────────────────┐
│ Aisha Williams │ │ Dr. Robert Kim (CMO) │
│ (BI Lead) │◀───────────│ │
│ │ dashboard │ AI-driven quality │
│ Operations Command │ access │ strategy, CMS Star │
│ Center, Quality │ │ Rating projections, │
│ dashboards │ │ board presentations │
└────────────────────────┘ └────────────────────────┘Maya's readmission risk scores power Aisha's quality dashboards and trigger care coordinator interventions. Jordan's trial matching engine feeds enrollment metrics that Dr. Kim reviews in strategic planning. The semantic layer ensures all four personas use the same CMS-aligned metric definitions.
Prerequisites
Before following these walkthroughs, ensure you have:
- A running MATIH Platform instance (see Installation)
- The Pinnacle Health sample dataset loaded (available in the platform's sample data catalog)
- Completed the Quickstart Tutorials for the workbenches you plan to use
- HIPAA-compliant environment configured (see Security)
Related Chapters
- Data Ingestion -- Configuring Airbyte connectors, FHIR APIs, and file imports
- Query Engine -- SQL federation across clinical and administrative sources
- Data Catalog -- Metadata management, HIPAA tagging, and lineage
- Pipelines -- Temporal-based clinical data orchestration
- ML Service -- Model training, registry, and clinical model deployment
- AI Service -- Text-to-SQL and clinical decision support agents
- Security & Governance -- HIPAA compliance, encryption, and access controls