Excel Import
The Matih platform supports importing data from Microsoft Excel files (.xlsx and .xls). Excel import handles multi-sheet workbooks, mixed data types within columns, merged cells, and formula evaluation.
Supported Formats
| Format | Extension | Max File Size | Notes |
|---|---|---|---|
| Excel 2007+ | .xlsx | 100 MB | OpenXML format. Recommended. |
| Excel 97-2003 | .xls | 50 MB | Legacy binary format. |
Multi-Sheet Support
When an Excel file contains multiple sheets, each sheet is treated as a separate stream during preview. You can select which sheets to import.
Preview Response
{
"fileId": "a1b2c3d4-...",
"fileName": "financial_report.xlsx",
"columns": [
{ "name": "region", "type": "STRING", "nullable": false, "sampleValues": ["North", "South", "East"] },
{ "name": "q1_revenue", "type": "DECIMAL", "nullable": false, "sampleValues": ["1250000.00", "980000.00"] },
{ "name": "q2_revenue", "type": "DECIMAL", "nullable": true, "sampleValues": ["1310000.00", "1020000.00"] }
],
"previewRows": [
{ "region": "North", "q1_revenue": 1250000.00, "q2_revenue": 1310000.00 },
{ "region": "South", "q1_revenue": 980000.00, "q2_revenue": 1020000.00 }
],
"totalRows": 48,
"inferredTypes": {
"region": "STRING",
"q1_revenue": "DECIMAL",
"q2_revenue": "DECIMAL"
}
}For multi-sheet workbooks, each sheet can be imported as a separate table by uploading once and requesting the preview for each sheet.
Data Type Handling
Excel cells carry explicit type information. The import engine maps Excel types to Iceberg types.
| Excel Cell Type | Iceberg Type | Notes |
|---|---|---|
| Text | STRING | Direct mapping |
| Number (integer) | LONG | Whole numbers without decimal places |
| Number (decimal) | DOUBLE | Numbers with decimal places |
| Date | DATE | Excel serial dates are converted to ISO 8601 |
| DateTime | TIMESTAMP | Time-zone naive timestamps |
| Boolean | BOOLEAN | TRUE/FALSE cell values |
| Formula | Evaluated type | Formulas are evaluated; the result type is used |
Error (#N/A, #REF!) | NULL | Error cells are imported as null values |
| Empty | NULL | Empty cells are imported as null values |
Date Handling
Excel stores dates as serial numbers (days since January 1, 1900). The import engine detects date-formatted cells and converts them to ISO 8601 date strings. This handles:
- Date-only values:
2024-03-15 - DateTime values:
2024-03-15T14:30:00 - Custom date formats: The underlying serial number is converted regardless of the display format
Mixed-Type Columns
Excel columns can contain mixed types (e.g., some cells are numbers, others are text). The import engine handles this by:
- Sampling all non-empty cells in the column
- If more than 90% of values are the same type, using that type (with exceptions converted to null or string)
- If types are mixed, widening to
STRING
Header Row Detection
The import engine detects the header row using these heuristics:
| Heuristic | Weight | Description |
|---|---|---|
| First row is all text | High | All cells in row 1 contain text values while subsequent rows contain mixed types |
| First row values are unique | Medium | Header values are typically distinct column names |
| First row has no empty cells | Low | Headers usually do not contain blanks |
| Formatting differs | Medium | Bold, different font size, or background color on first row |
If the header is not in row 1 (e.g., the file has title rows or metadata rows above the data), the preview will show incorrect results. In this case, you can specify the header row offset during the schema configuration step.
Merged Cells
Merged cells in Excel are handled as follows:
| Merge Direction | Behavior |
|---|---|
| Horizontal merge (across columns) | The value is placed in the leftmost column; other columns in the merge receive null |
| Vertical merge (across rows) | The value is placed in the topmost row; other rows in the merge receive the same value (forward fill) |
Formulas
Excel formulas are evaluated during import. The result value is used, not the formula text.
| Formula Type | Handling |
|---|---|
Simple formulas (=A1+B1) | Evaluated, result imported |
Cross-sheet references (=Sheet2!A1) | Evaluated if the referenced sheet is present |
External references (=[other.xlsx]Sheet1!A1) | Evaluate to #REF! error, imported as null |
| Array formulas | Evaluated, results imported |
Volatile functions (=NOW(), =RAND()) | Evaluated at import time |
Common Issues
| Issue | Cause | Resolution |
|---|---|---|
| Dates appear as numbers (e.g., 45371) | Date column not detected as date-formatted | The column was stored as a plain number in Excel without date formatting. Import as INTEGER and convert post-ingestion using DATE_ADD('1899-12-30', INTERVAL value DAY). |
| Empty columns appear in preview | Hidden columns or trailing empty cells | Exclude empty columns in the column mapping step |
| First row data imported as header | Header detection false positive | Specify that the file has no header row in the schema configuration |
#N/A values in source appear as null | Expected behavior | Error cells are intentionally mapped to null. If you need to preserve error indicators, export the Excel file as CSV first. |
| Large file upload timeout | File exceeds 100 MB | Split the workbook into smaller files, or export individual sheets as CSV/Parquet |
| Number precision loss | Excel stores all numbers as IEEE 754 doubles | Very large integers (>2^53) may lose precision. For financial data, verify decimal precision in the preview. |
Example: Multi-Sheet Excel Import
1. Upload: POST /api/v1/files/upload
File: annual_report_2024.xlsx (8.5 MB, 3 sheets)
2. Preview Sheet 1: GET /api/v1/files/{fileId}/preview
Sheet: "Revenue"
Columns: region (STRING), month (DATE), amount (DECIMAL)
Total rows: 600
3. Configure: PUT /api/v1/files/{fileId}/schema
{
"targetTableName": "revenue_2024",
"targetSchema": "finance"
}
4. Import: POST /api/v1/files/{fileId}/import
Result: 600 records imported into iceberg.finance.revenue_2024
5. Repeat steps 2-4 for remaining sheets (Expenses, Headcount)