Key Terminology

This glossary defines the terms and concepts used throughout the MATIH Platform documentation. Terms are organized by domain. Where a term has both a general industry meaning and a MATIH-specific meaning, the MATIH-specific usage is noted.

Platform Architecture Terms

Term	Definition
Control Plane	The set of 10 Java/Spring Boot services that manage platform-wide concerns: identity, tenant lifecycle, configuration, billing, auditing, and observability. The Control Plane runs once per platform installation and is shared across all tenants.
Data Plane	The set of 14 services (Java, Python, Node.js) that execute tenant workloads: AI orchestration, query execution, ML operations, pipeline management, and BI operations. Data Plane services are logically isolated per tenant.
Tenant	An isolated organizational unit within the MATIH Platform. Each tenant has its own Kubernetes namespace, database schemas, DNS zone, TLS certificate, and resource quotas. Tenants share the physical cluster but are isolated at the network, compute, and data layers.
Workbench	A purpose-built React/TypeScript frontend application designed for a specific user persona and workflow. MATIH includes 8 workbenches (BI, ML, Data, Agentic, Control Plane, Data Plane, Ops, Onboarding).
Context Graph	A Neo4j-backed knowledge graph that maintains relationships between all platform entities: users, queries, tables, dashboards, models, and pipelines. Used for impact analysis, recommendations, and query enrichment.
Semantic Layer	An abstraction layer that defines business metrics, dimensions, and hierarchies on top of raw data schemas. Ensures consistent metric definitions across dashboards, queries, and reports.

AI and ML Terms

Term	Definition
Agent	A specialized AI component that performs a specific task within the multi-agent orchestrator. MATIH defines five agents: Router, SQL, Analysis, Visualization, and Documentation.
Orchestrator	The LangGraph-based state machine that coordinates multiple agents to process a user's conversational request. Manages agent sequencing, state transitions, and error handling.
Intent	The classified purpose of a user's message, determined by the Router Agent. Intents include SQL query, documentation lookup, analysis request, and visualization request.
Text-to-SQL	The process of converting a natural language question into a valid SQL query. In MATIH, this involves question parsing, schema retrieval (via Qdrant), LLM-based generation, and syntactic validation (via sqlglot).
RAG (Retrieval-Augmented Generation)	A technique that enhances LLM responses by retrieving relevant context from a vector database before generation. MATIH uses RAG to provide schema context, documentation, and query history to the AI agents.
Vector Embedding	A numerical representation of text (schema descriptions, queries, documentation) in a high-dimensional vector space. Stored in Qdrant for semantic similarity search.
LangGraph	An open-source framework for building stateful, multi-agent applications as directed graphs. MATIH uses LangGraph to define the agent orchestration workflow.
Drift Detection	The monitoring of changes in data distributions or model prediction patterns over time. Used to detect when data quality degrades or when a deployed model's performance deteriorates.
Feature Store	A centralized repository of computed features (derived data attributes) used for ML model training and serving. Ensures consistent feature computation between training and inference.
Model Registry	A versioned catalog of trained ML models with metadata including training parameters, performance metrics, lineage, and deployment history. Models progress through stages: Staging, Production, Archived.
Experiment	A named collection of ML training runs that share a common objective (e.g., "churn_prediction_v3"). Each experiment contains multiple runs with different hyperparameters and results.

Data Engineering Terms

Term	Definition
Federated Query	A SQL query that joins data across multiple data sources (e.g., PostgreSQL and S3/Parquet) in a single statement. Enabled by Trino's connector architecture.
Catalog (Trino)	A named connection to a data source in Trino. Each tenant has one or more catalogs configured with tenant-specific credentials and access permissions. Not to be confused with the Data Catalog service.
Data Catalog	The catalog-service that provides a centralized metadata repository with search, lineage tracking, data classification, and business glossary capabilities.
Pipeline	An automated workflow that moves and transforms data from source to destination. Pipelines can be batch (Airflow), streaming (Flink), or hybrid.
CDC (Change Data Capture)	A pattern for capturing and propagating database changes (inserts, updates, deletes) in real time, typically via Kafka. Used for incremental data loading and event-driven processing.
Data Lineage	The tracking of data flow from its origin through transformations to its consumption points (dashboards, models, reports). Maintained by the catalog-service and visualized in the Context Graph.
Schema Drift	An unplanned change in the structure of a data source (new columns, type changes, dropped columns) that can break downstream pipelines and queries.
Data Quality Score	A numerical measure (0-100) of data trustworthiness computed by the data-quality-service based on completeness, accuracy, timeliness, consistency, and validity checks.

Infrastructure Terms

Term	Definition
Helm Chart	A package of Kubernetes resource definitions (Deployments, Services, ConfigMaps, etc.) with parameterized values. MATIH uses 55+ Helm charts to deploy all platform and infrastructure components.
Namespace	A Kubernetes resource that provides a scope for names and a boundary for resource quotas and network policies. Each MATIH tenant gets a dedicated namespace (`matih-tenant-{id}`).
ResourceQuota	A Kubernetes resource that limits the total CPU, memory, and storage a namespace (tenant) can consume. Enforced by the Kubernetes API server.
NetworkPolicy	A Kubernetes resource that defines allowed network traffic between pods and namespaces. Used to enforce tenant isolation at the network layer.
Ingress	A Kubernetes resource that manages external HTTP/HTTPS access to services. MATIH deploys per-tenant NGINX ingress controllers with TLS termination.
cert-manager	A Kubernetes add-on that automates TLS certificate issuance and renewal. MATIH uses cert-manager with Let's Encrypt for automatic HTTPS.
External Secrets Operator (ESO)	A Kubernetes operator that synchronizes secrets from external secret managers (Azure Key Vault, AWS Secrets Manager, GCP Secret Manager) into Kubernetes Secrets.
HPA (Horizontal Pod Autoscaler)	A Kubernetes resource that automatically scales the number of pod replicas based on observed CPU, memory, or custom metrics.
Terraform	An infrastructure-as-code tool for provisioning cloud resources declaratively. MATIH provides Terraform modules for Azure, AWS, and GCP.

Observability Terms

Term	Definition
Trace	A distributed trace that follows a request as it flows through multiple services. Consists of spans, each representing a unit of work in a specific service. Collected via OpenTelemetry and stored in Tempo.
Span	A single unit of work within a distributed trace, representing an operation in a specific service (e.g., "SQL generation in ai-service", "query execution in Trino").
Metric	A numerical measurement collected over time (e.g., request count, latency percentile, CPU usage). Collected by Prometheus and visualized in Grafana.
SLO (Service Level Objective)	A target for a service-level indicator (e.g., "99.9% of requests complete in under 500ms"). Used to define and measure platform reliability commitments.
SLI (Service Level Indicator)	A quantitative measure of service performance (e.g., "p99 latency", "error rate"). SLIs feed into SLO calculations.
Structured Logging	A logging practice where log entries are emitted as machine-parseable JSON objects with standardized fields (timestamp, level, service, tenant_id, trace_id) rather than free-text strings. All MATIH services use structured logging.

Security Terms

Term	Definition
JWT (JSON Web Token)	A compact, signed token used for authentication and authorization. MATIH JWTs contain user identity, tenant ID, and role claims. Validated by every service on every request.
RBAC (Role-Based Access Control)	An access control model where permissions are assigned to roles, and roles are assigned to users. MATIH supports platform-level roles (Platform Admin, Tenant Admin) and tenant-level roles (Data Engineer, Analyst, Viewer).
OIDC (OpenID Connect)	An authentication protocol built on OAuth 2.0 that enables single sign-on with corporate identity providers (Azure AD, Okta, Keycloak). MATIH supports OIDC for user authentication.
Row-Level Security (RLS)	A data access control mechanism that filters query results based on user attributes (e.g., a regional manager sees only data for their region). Enforced at the semantic layer and query engine.
Column Masking	A data protection mechanism that hides or obfuscates sensitive column values (e.g., masking SSN as `*--1234`) based on user permissions.
Tenant Isolation	The set of mechanisms that prevent one tenant from accessing another tenant's data, configuration, or resources. Enforced at the network, compute, storage, and application layers.

Business Intelligence Terms

Term	Definition
Dashboard	A collection of visual widgets (charts, tables, KPIs, filters) arranged on a canvas that displays data from one or more queries. Managed by the bi-service.
Widget	A single visual component within a dashboard: chart, table, KPI card, text block, or filter control. Widgets are composable and configurable.
Scheduled Refresh	An automatic re-execution of dashboard queries on a configurable schedule (e.g., every 4 hours) to keep displayed data current.
Cross-Filter	An interactive behavior where clicking a data point in one widget filters all other widgets in the same dashboard.
Drill-Through	Navigation from a summary view to a detail view by clicking a data point. Defined by dimension hierarchies in the semantic layer.

Abbreviations

Abbreviation	Full Form
MATIH	(Platform name -- not an acronym)
AI	Artificial Intelligence
BI	Business Intelligence
CDC	Change Data Capture
CLI	Command-Line Interface
CRUD	Create, Read, Update, Delete
DAG	Directed Acyclic Graph
DTO	Data Transfer Object
ESO	External Secrets Operator
ETL	Extract, Transform, Load
HPA	Horizontal Pod Autoscaler
IAM	Identity and Access Management
JWT	JSON Web Token
LLM	Large Language Model
ML	Machine Learning
MTTR	Mean Time to Resolution
OIDC	OpenID Connect
OTel	OpenTelemetry
RAG	Retrieval-Augmented Generation
RBAC	Role-Based Access Control
RLS	Row-Level Security
SLI	Service Level Indicator
SLO	Service Level Objective
SRE	Site Reliability Engineering
TLS	Transport Layer Security
UDF	User-Defined Function

Conventions Used in This Documentation

Throughout this documentation, we use the following conventions:

Convention	Meaning
`monospace text`	Code, commands, file paths, configuration keys, and service names
Bold text	Terms being defined, emphasis, or important notes
Italic text	First use of a term, or emphasis in narrative context
`service-name (port)`	A MATIH service with its default port number
`{tenant_id}`	A placeholder for a tenant-specific value
Tables with "Planned"	Features that are designed but not yet implemented in the current release
`{/* Diagram: ... */}`	Placeholder for a diagram that will be added in a future documentation update