MATIH Platform is in active MVP development. Documentation reflects current implementation status.
1. Introduction
Key Terminology

Key Terminology

This glossary defines the terms and concepts used throughout the MATIH Platform documentation. Terms are organized by domain. Where a term has both a general industry meaning and a MATIH-specific meaning, the MATIH-specific usage is noted.


Platform Architecture Terms

TermDefinition
Control PlaneThe set of 10 Java/Spring Boot services that manage platform-wide concerns: identity, tenant lifecycle, configuration, billing, auditing, and observability. The Control Plane runs once per platform installation and is shared across all tenants.
Data PlaneThe set of 14 services (Java, Python, Node.js) that execute tenant workloads: AI orchestration, query execution, ML operations, pipeline management, and BI operations. Data Plane services are logically isolated per tenant.
TenantAn isolated organizational unit within the MATIH Platform. Each tenant has its own Kubernetes namespace, database schemas, DNS zone, TLS certificate, and resource quotas. Tenants share the physical cluster but are isolated at the network, compute, and data layers.
WorkbenchA purpose-built React/TypeScript frontend application designed for a specific user persona and workflow. MATIH includes 8 workbenches (BI, ML, Data, Agentic, Control Plane, Data Plane, Ops, Onboarding).
Context GraphA Neo4j-backed knowledge graph that maintains relationships between all platform entities: users, queries, tables, dashboards, models, and pipelines. Used for impact analysis, recommendations, and query enrichment.
Semantic LayerAn abstraction layer that defines business metrics, dimensions, and hierarchies on top of raw data schemas. Ensures consistent metric definitions across dashboards, queries, and reports.

AI and ML Terms

TermDefinition
AgentA specialized AI component that performs a specific task within the multi-agent orchestrator. MATIH defines five agents: Router, SQL, Analysis, Visualization, and Documentation.
OrchestratorThe LangGraph-based state machine that coordinates multiple agents to process a user's conversational request. Manages agent sequencing, state transitions, and error handling.
IntentThe classified purpose of a user's message, determined by the Router Agent. Intents include SQL query, documentation lookup, analysis request, and visualization request.
Text-to-SQLThe process of converting a natural language question into a valid SQL query. In MATIH, this involves question parsing, schema retrieval (via Qdrant), LLM-based generation, and syntactic validation (via sqlglot).
RAG (Retrieval-Augmented Generation)A technique that enhances LLM responses by retrieving relevant context from a vector database before generation. MATIH uses RAG to provide schema context, documentation, and query history to the AI agents.
Vector EmbeddingA numerical representation of text (schema descriptions, queries, documentation) in a high-dimensional vector space. Stored in Qdrant for semantic similarity search.
LangGraphAn open-source framework for building stateful, multi-agent applications as directed graphs. MATIH uses LangGraph to define the agent orchestration workflow.
Drift DetectionThe monitoring of changes in data distributions or model prediction patterns over time. Used to detect when data quality degrades or when a deployed model's performance deteriorates.
Feature StoreA centralized repository of computed features (derived data attributes) used for ML model training and serving. Ensures consistent feature computation between training and inference.
Model RegistryA versioned catalog of trained ML models with metadata including training parameters, performance metrics, lineage, and deployment history. Models progress through stages: Staging, Production, Archived.
ExperimentA named collection of ML training runs that share a common objective (e.g., "churn_prediction_v3"). Each experiment contains multiple runs with different hyperparameters and results.

Data Engineering Terms

TermDefinition
Federated QueryA SQL query that joins data across multiple data sources (e.g., PostgreSQL and S3/Parquet) in a single statement. Enabled by Trino's connector architecture.
Catalog (Trino)A named connection to a data source in Trino. Each tenant has one or more catalogs configured with tenant-specific credentials and access permissions. Not to be confused with the Data Catalog service.
Data CatalogThe catalog-service that provides a centralized metadata repository with search, lineage tracking, data classification, and business glossary capabilities.
PipelineAn automated workflow that moves and transforms data from source to destination. Pipelines can be batch (Airflow), streaming (Flink), or hybrid.
CDC (Change Data Capture)A pattern for capturing and propagating database changes (inserts, updates, deletes) in real time, typically via Kafka. Used for incremental data loading and event-driven processing.
Data LineageThe tracking of data flow from its origin through transformations to its consumption points (dashboards, models, reports). Maintained by the catalog-service and visualized in the Context Graph.
Schema DriftAn unplanned change in the structure of a data source (new columns, type changes, dropped columns) that can break downstream pipelines and queries.
Data Quality ScoreA numerical measure (0-100) of data trustworthiness computed by the data-quality-service based on completeness, accuracy, timeliness, consistency, and validity checks.

Infrastructure Terms

TermDefinition
Helm ChartA package of Kubernetes resource definitions (Deployments, Services, ConfigMaps, etc.) with parameterized values. MATIH uses 55+ Helm charts to deploy all platform and infrastructure components.
NamespaceA Kubernetes resource that provides a scope for names and a boundary for resource quotas and network policies. Each MATIH tenant gets a dedicated namespace (matih-tenant-{id}).
ResourceQuotaA Kubernetes resource that limits the total CPU, memory, and storage a namespace (tenant) can consume. Enforced by the Kubernetes API server.
NetworkPolicyA Kubernetes resource that defines allowed network traffic between pods and namespaces. Used to enforce tenant isolation at the network layer.
IngressA Kubernetes resource that manages external HTTP/HTTPS access to services. MATIH deploys per-tenant NGINX ingress controllers with TLS termination.
cert-managerA Kubernetes add-on that automates TLS certificate issuance and renewal. MATIH uses cert-manager with Let's Encrypt for automatic HTTPS.
External Secrets Operator (ESO)A Kubernetes operator that synchronizes secrets from external secret managers (Azure Key Vault, AWS Secrets Manager, GCP Secret Manager) into Kubernetes Secrets.
HPA (Horizontal Pod Autoscaler)A Kubernetes resource that automatically scales the number of pod replicas based on observed CPU, memory, or custom metrics.
TerraformAn infrastructure-as-code tool for provisioning cloud resources declaratively. MATIH provides Terraform modules for Azure, AWS, and GCP.

Observability Terms

TermDefinition
TraceA distributed trace that follows a request as it flows through multiple services. Consists of spans, each representing a unit of work in a specific service. Collected via OpenTelemetry and stored in Tempo.
SpanA single unit of work within a distributed trace, representing an operation in a specific service (e.g., "SQL generation in ai-service", "query execution in Trino").
MetricA numerical measurement collected over time (e.g., request count, latency percentile, CPU usage). Collected by Prometheus and visualized in Grafana.
SLO (Service Level Objective)A target for a service-level indicator (e.g., "99.9% of requests complete in under 500ms"). Used to define and measure platform reliability commitments.
SLI (Service Level Indicator)A quantitative measure of service performance (e.g., "p99 latency", "error rate"). SLIs feed into SLO calculations.
Structured LoggingA logging practice where log entries are emitted as machine-parseable JSON objects with standardized fields (timestamp, level, service, tenant_id, trace_id) rather than free-text strings. All MATIH services use structured logging.

Security Terms

TermDefinition
JWT (JSON Web Token)A compact, signed token used for authentication and authorization. MATIH JWTs contain user identity, tenant ID, and role claims. Validated by every service on every request.
RBAC (Role-Based Access Control)An access control model where permissions are assigned to roles, and roles are assigned to users. MATIH supports platform-level roles (Platform Admin, Tenant Admin) and tenant-level roles (Data Engineer, Analyst, Viewer).
OIDC (OpenID Connect)An authentication protocol built on OAuth 2.0 that enables single sign-on with corporate identity providers (Azure AD, Okta, Keycloak). MATIH supports OIDC for user authentication.
Row-Level Security (RLS)A data access control mechanism that filters query results based on user attributes (e.g., a regional manager sees only data for their region). Enforced at the semantic layer and query engine.
Column MaskingA data protection mechanism that hides or obfuscates sensitive column values (e.g., masking SSN as ***-**-1234) based on user permissions.
Tenant IsolationThe set of mechanisms that prevent one tenant from accessing another tenant's data, configuration, or resources. Enforced at the network, compute, storage, and application layers.

Business Intelligence Terms

TermDefinition
DashboardA collection of visual widgets (charts, tables, KPIs, filters) arranged on a canvas that displays data from one or more queries. Managed by the bi-service.
WidgetA single visual component within a dashboard: chart, table, KPI card, text block, or filter control. Widgets are composable and configurable.
Scheduled RefreshAn automatic re-execution of dashboard queries on a configurable schedule (e.g., every 4 hours) to keep displayed data current.
Cross-FilterAn interactive behavior where clicking a data point in one widget filters all other widgets in the same dashboard.
Drill-ThroughNavigation from a summary view to a detail view by clicking a data point. Defined by dimension hierarchies in the semantic layer.

Abbreviations

AbbreviationFull Form
MATIH(Platform name -- not an acronym)
AIArtificial Intelligence
BIBusiness Intelligence
CDCChange Data Capture
CLICommand-Line Interface
CRUDCreate, Read, Update, Delete
DAGDirected Acyclic Graph
DTOData Transfer Object
ESOExternal Secrets Operator
ETLExtract, Transform, Load
HPAHorizontal Pod Autoscaler
IAMIdentity and Access Management
JWTJSON Web Token
LLMLarge Language Model
MLMachine Learning
MTTRMean Time to Resolution
OIDCOpenID Connect
OTelOpenTelemetry
RAGRetrieval-Augmented Generation
RBACRole-Based Access Control
RLSRow-Level Security
SLIService Level Indicator
SLOService Level Objective
SRESite Reliability Engineering
TLSTransport Layer Security
UDFUser-Defined Function

Conventions Used in This Documentation

Throughout this documentation, we use the following conventions:

ConventionMeaning
monospace textCode, commands, file paths, configuration keys, and service names
Bold textTerms being defined, emphasis, or important notes
Italic textFirst use of a term, or emphasis in narrative context
service-name (port)A MATIH service with its default port number
{tenant_id}A placeholder for a tenant-specific value
Tables with "Planned"Features that are designed but not yet implemented in the current release
{/* Diagram: ... */}Placeholder for a diagram that will be added in a future documentation update

Further Reading