Key Terminology
This glossary defines the terms and concepts used throughout the MATIH Platform documentation. Terms are organized by domain. Where a term has both a general industry meaning and a MATIH-specific meaning, the MATIH-specific usage is noted.
Platform Architecture Terms
| Term | Definition |
|---|---|
| Control Plane | The set of 10 Java/Spring Boot services that manage platform-wide concerns: identity, tenant lifecycle, configuration, billing, auditing, and observability. The Control Plane runs once per platform installation and is shared across all tenants. |
| Data Plane | The set of 14 services (Java, Python, Node.js) that execute tenant workloads: AI orchestration, query execution, ML operations, pipeline management, and BI operations. Data Plane services are logically isolated per tenant. |
| Tenant | An isolated organizational unit within the MATIH Platform. Each tenant has its own Kubernetes namespace, database schemas, DNS zone, TLS certificate, and resource quotas. Tenants share the physical cluster but are isolated at the network, compute, and data layers. |
| Workbench | A purpose-built React/TypeScript frontend application designed for a specific user persona and workflow. MATIH includes 8 workbenches (BI, ML, Data, Agentic, Control Plane, Data Plane, Ops, Onboarding). |
| Context Graph | A Neo4j-backed knowledge graph that maintains relationships between all platform entities: users, queries, tables, dashboards, models, and pipelines. Used for impact analysis, recommendations, and query enrichment. |
| Semantic Layer | An abstraction layer that defines business metrics, dimensions, and hierarchies on top of raw data schemas. Ensures consistent metric definitions across dashboards, queries, and reports. |
AI and ML Terms
| Term | Definition |
|---|---|
| Agent | A specialized AI component that performs a specific task within the multi-agent orchestrator. MATIH defines five agents: Router, SQL, Analysis, Visualization, and Documentation. |
| Orchestrator | The LangGraph-based state machine that coordinates multiple agents to process a user's conversational request. Manages agent sequencing, state transitions, and error handling. |
| Intent | The classified purpose of a user's message, determined by the Router Agent. Intents include SQL query, documentation lookup, analysis request, and visualization request. |
| Text-to-SQL | The process of converting a natural language question into a valid SQL query. In MATIH, this involves question parsing, schema retrieval (via Qdrant), LLM-based generation, and syntactic validation (via sqlglot). |
| RAG (Retrieval-Augmented Generation) | A technique that enhances LLM responses by retrieving relevant context from a vector database before generation. MATIH uses RAG to provide schema context, documentation, and query history to the AI agents. |
| Vector Embedding | A numerical representation of text (schema descriptions, queries, documentation) in a high-dimensional vector space. Stored in Qdrant for semantic similarity search. |
| LangGraph | An open-source framework for building stateful, multi-agent applications as directed graphs. MATIH uses LangGraph to define the agent orchestration workflow. |
| Drift Detection | The monitoring of changes in data distributions or model prediction patterns over time. Used to detect when data quality degrades or when a deployed model's performance deteriorates. |
| Feature Store | A centralized repository of computed features (derived data attributes) used for ML model training and serving. Ensures consistent feature computation between training and inference. |
| Model Registry | A versioned catalog of trained ML models with metadata including training parameters, performance metrics, lineage, and deployment history. Models progress through stages: Staging, Production, Archived. |
| Experiment | A named collection of ML training runs that share a common objective (e.g., "churn_prediction_v3"). Each experiment contains multiple runs with different hyperparameters and results. |
Data Engineering Terms
| Term | Definition |
|---|---|
| Federated Query | A SQL query that joins data across multiple data sources (e.g., PostgreSQL and S3/Parquet) in a single statement. Enabled by Trino's connector architecture. |
| Catalog (Trino) | A named connection to a data source in Trino. Each tenant has one or more catalogs configured with tenant-specific credentials and access permissions. Not to be confused with the Data Catalog service. |
| Data Catalog | The catalog-service that provides a centralized metadata repository with search, lineage tracking, data classification, and business glossary capabilities. |
| Pipeline | An automated workflow that moves and transforms data from source to destination. Pipelines can be batch (Airflow), streaming (Flink), or hybrid. |
| CDC (Change Data Capture) | A pattern for capturing and propagating database changes (inserts, updates, deletes) in real time, typically via Kafka. Used for incremental data loading and event-driven processing. |
| Data Lineage | The tracking of data flow from its origin through transformations to its consumption points (dashboards, models, reports). Maintained by the catalog-service and visualized in the Context Graph. |
| Schema Drift | An unplanned change in the structure of a data source (new columns, type changes, dropped columns) that can break downstream pipelines and queries. |
| Data Quality Score | A numerical measure (0-100) of data trustworthiness computed by the data-quality-service based on completeness, accuracy, timeliness, consistency, and validity checks. |
Infrastructure Terms
| Term | Definition |
|---|---|
| Helm Chart | A package of Kubernetes resource definitions (Deployments, Services, ConfigMaps, etc.) with parameterized values. MATIH uses 55+ Helm charts to deploy all platform and infrastructure components. |
| Namespace | A Kubernetes resource that provides a scope for names and a boundary for resource quotas and network policies. Each MATIH tenant gets a dedicated namespace (matih-tenant-{id}). |
| ResourceQuota | A Kubernetes resource that limits the total CPU, memory, and storage a namespace (tenant) can consume. Enforced by the Kubernetes API server. |
| NetworkPolicy | A Kubernetes resource that defines allowed network traffic between pods and namespaces. Used to enforce tenant isolation at the network layer. |
| Ingress | A Kubernetes resource that manages external HTTP/HTTPS access to services. MATIH deploys per-tenant NGINX ingress controllers with TLS termination. |
| cert-manager | A Kubernetes add-on that automates TLS certificate issuance and renewal. MATIH uses cert-manager with Let's Encrypt for automatic HTTPS. |
| External Secrets Operator (ESO) | A Kubernetes operator that synchronizes secrets from external secret managers (Azure Key Vault, AWS Secrets Manager, GCP Secret Manager) into Kubernetes Secrets. |
| HPA (Horizontal Pod Autoscaler) | A Kubernetes resource that automatically scales the number of pod replicas based on observed CPU, memory, or custom metrics. |
| Terraform | An infrastructure-as-code tool for provisioning cloud resources declaratively. MATIH provides Terraform modules for Azure, AWS, and GCP. |
Observability Terms
| Term | Definition |
|---|---|
| Trace | A distributed trace that follows a request as it flows through multiple services. Consists of spans, each representing a unit of work in a specific service. Collected via OpenTelemetry and stored in Tempo. |
| Span | A single unit of work within a distributed trace, representing an operation in a specific service (e.g., "SQL generation in ai-service", "query execution in Trino"). |
| Metric | A numerical measurement collected over time (e.g., request count, latency percentile, CPU usage). Collected by Prometheus and visualized in Grafana. |
| SLO (Service Level Objective) | A target for a service-level indicator (e.g., "99.9% of requests complete in under 500ms"). Used to define and measure platform reliability commitments. |
| SLI (Service Level Indicator) | A quantitative measure of service performance (e.g., "p99 latency", "error rate"). SLIs feed into SLO calculations. |
| Structured Logging | A logging practice where log entries are emitted as machine-parseable JSON objects with standardized fields (timestamp, level, service, tenant_id, trace_id) rather than free-text strings. All MATIH services use structured logging. |
Security Terms
| Term | Definition |
|---|---|
| JWT (JSON Web Token) | A compact, signed token used for authentication and authorization. MATIH JWTs contain user identity, tenant ID, and role claims. Validated by every service on every request. |
| RBAC (Role-Based Access Control) | An access control model where permissions are assigned to roles, and roles are assigned to users. MATIH supports platform-level roles (Platform Admin, Tenant Admin) and tenant-level roles (Data Engineer, Analyst, Viewer). |
| OIDC (OpenID Connect) | An authentication protocol built on OAuth 2.0 that enables single sign-on with corporate identity providers (Azure AD, Okta, Keycloak). MATIH supports OIDC for user authentication. |
| Row-Level Security (RLS) | A data access control mechanism that filters query results based on user attributes (e.g., a regional manager sees only data for their region). Enforced at the semantic layer and query engine. |
| Column Masking | A data protection mechanism that hides or obfuscates sensitive column values (e.g., masking SSN as ***-**-1234) based on user permissions. |
| Tenant Isolation | The set of mechanisms that prevent one tenant from accessing another tenant's data, configuration, or resources. Enforced at the network, compute, storage, and application layers. |
Business Intelligence Terms
| Term | Definition |
|---|---|
| Dashboard | A collection of visual widgets (charts, tables, KPIs, filters) arranged on a canvas that displays data from one or more queries. Managed by the bi-service. |
| Widget | A single visual component within a dashboard: chart, table, KPI card, text block, or filter control. Widgets are composable and configurable. |
| Scheduled Refresh | An automatic re-execution of dashboard queries on a configurable schedule (e.g., every 4 hours) to keep displayed data current. |
| Cross-Filter | An interactive behavior where clicking a data point in one widget filters all other widgets in the same dashboard. |
| Drill-Through | Navigation from a summary view to a detail view by clicking a data point. Defined by dimension hierarchies in the semantic layer. |
Abbreviations
| Abbreviation | Full Form |
|---|---|
| MATIH | (Platform name -- not an acronym) |
| AI | Artificial Intelligence |
| BI | Business Intelligence |
| CDC | Change Data Capture |
| CLI | Command-Line Interface |
| CRUD | Create, Read, Update, Delete |
| DAG | Directed Acyclic Graph |
| DTO | Data Transfer Object |
| ESO | External Secrets Operator |
| ETL | Extract, Transform, Load |
| HPA | Horizontal Pod Autoscaler |
| IAM | Identity and Access Management |
| JWT | JSON Web Token |
| LLM | Large Language Model |
| ML | Machine Learning |
| MTTR | Mean Time to Resolution |
| OIDC | OpenID Connect |
| OTel | OpenTelemetry |
| RAG | Retrieval-Augmented Generation |
| RBAC | Role-Based Access Control |
| RLS | Row-Level Security |
| SLI | Service Level Indicator |
| SLO | Service Level Objective |
| SRE | Site Reliability Engineering |
| TLS | Transport Layer Security |
| UDF | User-Defined Function |
Conventions Used in This Documentation
Throughout this documentation, we use the following conventions:
| Convention | Meaning |
|---|---|
monospace text | Code, commands, file paths, configuration keys, and service names |
| Bold text | Terms being defined, emphasis, or important notes |
| Italic text | First use of a term, or emphasis in narrative context |
service-name (port) | A MATIH service with its default port number |
{tenant_id} | A placeholder for a tenant-specific value |
| Tables with "Planned" | Features that are designed but not yet implemented in the current release |
{/* Diagram: ... */} | Placeholder for a diagram that will be added in a future documentation update |
Further Reading
- Platform Capabilities -- Detailed descriptions of each capability referenced in this glossary
- Architecture Deep Dive -- Technical architecture using the terms defined here
- Security and Multi-Tenancy -- Detailed explanation of tenant isolation, RBAC, and security mechanisms
- AI Service -- Deep dive into the agent orchestrator, text-to-SQL, and RAG systems