Your AI Agent Is Guessing. Ontology Is the Fix.

March 2026 · 14 min read

The 23% Problem

A global pharmaceutical company deploys an AI agent on its data lake. The agent has access to 14,000 tables across clinical trials, manufacturing, supply chain, and commercial operations. It generates syntactically perfect SQL. It returns confident answers in natural language with proper citations.

It is wrong 23% of the time.

Not because of hallucination. Not because the SQL is broken. Because the term "adverse event" means three different things across three departments. In clinical trials, it means a patient-reported side effect during a study. In pharmacovigilance, it means a post-market safety signal requiring regulatory reporting. In manufacturing quality, it means a batch contamination incident.

The agent picks one definition. It does not know the others exist. It generates a perfectly valid query against the wrong table, with the wrong filters, for the wrong business context. The result reads perfectly — and it is completely wrong.

This is not an AI problem. This is a knowledge problem.

The agent does not lack intelligence. It lacks understanding. It has no formal model of what the business means — only what the database contains. And those are fundamentally different things.

Thesis: Ontology is a business modeling problem, not a database design problem. The solution is not better SQL generation, more sophisticated RAG, or larger context windows. The solution is giving AI agents the same thing you give new employees on day one — a shared vocabulary that defines what your business means.

What Is an Ontology, Really?

An ontology is not a database schema. It is not an ER diagram. It is not a data dictionary (though it subsumes one). An ontology is a formal representation of the concepts, relationships, and rules that define a business domain.

Three components make an ontology:

1. Entities — The Things Your Business Cares About

Not tables. Not columns. Business objects. A "Customer" is an entity. A "Revenue Transaction" is an entity. A "Clinical Trial Endpoint" is an entity. These exist independent of how they are stored in any database.

2. Relationships — How They Connect

Not foreign keys. Business relationships. A Customer "places" an Order. An Order "contains" Products. A Clinical Trial "measures" Endpoints. These relationships carry business meaning — cardinality, temporality, and constraints — that no foreign key can express.

3. Definitions — What They Actually Mean

This is where most organizations fail. A definition is not a column comment. It is a precise, unambiguous, organizationally-agreed statement of meaning.

"Revenue" means total net sales minus returns and allowances, calculated at the point of delivery confirmation, in the reporting currency of the entity, excluding intercompany transactions.

That definition is 30 words. Without it, every AI agent, every analyst, and every dashboard is free to interpret "revenue" however it wants. With it, there is exactly one truth.

The critical distinction: A database schema tells you what data exists. An ontology tells you what the data means. Schema is structure. Ontology is semantics. An AI agent with schema access can generate SQL. An AI agent with ontology access can generate correct SQL — because it knows which table, which column, and which filter corresponds to the business concept the user is actually asking about.

Why AI Agents Need Ontology

Without ontology, an AI agent is a confident idiot. It has learned patterns from training data. It can match column names to natural language queries. It can infer JOINs from foreign keys. And it will do all of this with supreme confidence, even when the answer is wrong.

With ontology, an AI agent is an informed reasoner. It does not guess which table contains "revenue." It looks up the definition, resolves to the canonical metric, follows the ontology-defined join path, and generates SQL that is semantically correct — not just syntactically valid.

The Difference in Practice

Dimension	Without Ontology	With Ontology
Table selection	Pattern match on column names	Lookup canonical entity mapping
Join paths	Infer from foreign keys	Follow declared business relationships
Metric calculation	Guess from column name ("revenue" → `SUM(amount)`)	Use precise formula: `SUM(net_sales) - SUM(returns)`
Disambiguation	Pick first match	Resolve by business context (department, use case)
Confidence	Always high	Calibrated — asks for clarification when ambiguous
Error rate	15-35% on complex queries	Under 5% with well-maintained ontology
Auditability	"I found a column called revenue"	"Revenue is defined as [definition], mapped to [table.column], calculated as [formula]"

The auditability point matters more than accuracy in many enterprises. A CFO will not trust an AI-generated financial report that says "I found a column called revenue." They will trust one that says "I used the finance-approved revenue definition, which maps to fact_sales.net_amount - fact_returns.return_amount, filtered by entity_currency = reporting_currency and transaction_type != 'intercompany'."

Ontology vs. Semantic Layer: A Sharp Architectural Boundary

These terms are often confused. They should not be. Ontology and the semantic layer serve fundamentally different purposes, and both are required for AI agents to work correctly.

Ontology: What Does It Mean?

The ontology defines business concepts and their relationships. It is descriptive and domain-driven. It answers:

What is a "Customer"? (An entity that has purchased at least once, excluding internal test accounts)
How does "Customer" relate to "Order"? (One-to-many via customer_id, temporal — the relationship has a start date)
What are the valid business rules? (Revenue excludes intercompany transactions; a Customer in "churned" status has no active subscriptions for 90+ days)

The ontology does not know about SQL. It does not know about tables. It is a pure business model.

Semantic Layer: How Do I Calculate It?

The semantic layer translates business concepts into executable queries. It is prescriptive and technical. It answers:

How do I compute "Monthly Recurring Revenue"? (SUM(subscription_amount) WHERE status = 'active' GROUP BY month)
What dimensions can I slice by? (Region, segment, product line, time period)
What join paths connect these tables? (fact_subscriptions → dim_customers → dim_regions)
Who is allowed to see this data? (Row-level security: sales reps see their territory only)

Why Both Are Required

The ontology without a semantic layer is a glossary — useful for humans, useless for machines. The semantic layer without an ontology is a calculation engine that does not know what it is calculating — it can compute SUM(amount) but cannot tell you whether "amount" means revenue, cost, or something else entirely.

The text-to-SQL graveyard is full of systems that tried to skip the ontology and go directly from natural language to SQL via a semantic layer. They work beautifully on demos with 5 tables. They fail catastrophically at enterprise scale because the semantic layer encodes how to query but not what the query means. When a user asks "show me revenue by region," the semantic layer can compute SUM(net_sales) GROUP BY region — but only if something upstream has already resolved that "revenue" maps to net_sales (not gross_sales, not amount, not total), and "region" maps to dim_geography.sales_region (not shipping_region, not billing_region).

That "something upstream" is the ontology.

The Knowledge Graph: Ontology at Runtime

If the ontology is the schema, the knowledge graph is the database. The ontology defines that "Customer" is an entity type with properties like name, segment, and lifetime_value. The knowledge graph contains the actual instances: "Acme Corp is a Customer in the Enterprise segment with $2.4M lifetime value."

Concept	Ontology	Knowledge Graph
Nature	Schema / Type system	Instance data / Runtime state
Contains	Entity types, relationship types, rules	Actual entities, actual relationships, actual values
Changes	Slowly (business model evolution)	Constantly (new customers, new orders, updated statuses)
Scale	Hundreds of types	Millions of instances
Validated by	SHACL shapes, business rule engines	Constraints from the ontology
Used for	Definition lookup, disambiguation	Entity resolution, contextual search, lineage tracing

The knowledge graph enables a critical capability: entity resolution across systems. When a user asks "What is Acme Corp's revenue?", the agent needs to know that "Acme Corp" in the CRM is the same entity as "Acme Corporation" in the ERP and "ACME_CORP_001" in the billing system. The knowledge graph maintains these cross-system identity links — something no database schema or semantic layer can do.

Building Ontology: The Business-First Approach

Ontology cannot be reverse-engineered from a database. It must be built from business conversations. Here is the approach that works:

Step 1: Start with Business Questions, Not Data

Do not open a database browser. Open a conversation with domain experts. Ask: "What are the 10 most important questions your team answers every week?" Those questions reveal the entities, relationships, and definitions that matter.

"What was our revenue by region last quarter?" → Entities: Revenue, Region, Quarter. Relationships: Revenue is measured per Region per Quarter. Definitions: What exactly is "revenue"? Which regions? Fiscal or calendar quarter?

Step 2: Define Entities from the Business, Not the Schema

For each concept that emerges, define it in business terms first. A "Customer" is not customers.id — it is "an organization or individual that has completed at least one purchase, excluding internal test accounts and partner evaluation accounts." The database mapping comes later.

Step 3: Nail the Definitions — This Is Where Value Lives

Every entity, every metric, every dimension gets a precise, unambiguous definition approved by the relevant business stakeholders. This is the hard part. It requires getting the VP of Sales and the VP of Finance to agree on what "revenue" means. It requires the clinical operations team and pharmacovigilance team to acknowledge that "adverse event" means different things in their contexts — and to formally define both.

Step 4: Map to Physical Data — Last, Not First

Only after the business model is agreed upon do you map entities to physical tables and columns. This mapping is where the semantic layer takes over — translating ontology concepts into executable SQL.

Step 5: Validate and Iterate

Deploy the ontology. Let AI agents use it. Measure accuracy. When the agent gets answers wrong, trace back: was the definition incomplete? Was a relationship missing? Was the physical mapping incorrect? Feed the corrections back into the ontology. This is a continuous process, not a one-time project.

Anti-Patterns to Avoid

Anti-Pattern	Why It Fails
Reverse-engineering ontology from a schema	You get a data model, not a business model. Column names are not business definitions.
One team owns ontology	IT alone cannot define business meaning. Business alone cannot map to physical data. It requires both.
Perfect ontology before launch	Ontology is iterative. Start with the 20 most-asked business questions. Expand as agents encounter new concepts.
Treating ontology as documentation	If the ontology is a PDF that nobody reads, it has zero value. It must be machine-readable and consumed by agents at query time.

How MATIH Achieves This

MATIH implements ontology as a live, machine-readable system that AI agents consult on every query. This is not a governance artifact sitting in Confluence — it is a runtime service that participates in every data interaction.

The Three-Service Architecture

Ontology Service — The source of truth for business meaning. Domain experts define entities, relationships, and definitions through the Data Workbench UI. Definitions are versioned, validated against SHACL shapes (W3C standard for graph data validation), and published as an API that any service can consume. When a business definition changes — say, "revenue" is updated to exclude a new transaction type — the change propagates to every downstream consumer automatically.

Semantic Layer (WrenAI) — The translation engine. Once the ontology defines what "revenue" means, the semantic layer defines how to compute it: which tables, which columns, which joins, which filters. WrenAI provides a modeling API where data engineers define metrics and dimensions, each linked to an ontology entity. When an AI agent needs to generate SQL, it does not query the database directly — it queries WrenAI, which generates semantically correct SQL based on the ontology-linked metric definitions.

Context Graph — The runtime knowledge base. Built on a graph database, the context graph stores the actual instances of ontology-defined entities and their relationships. When a user asks "Show me Acme Corp's revenue," the context graph resolves "Acme Corp" to the correct entity across CRM, ERP, and billing systems, then passes the resolved entity to the semantic layer for query generation.

How an Agent Query Flows

User asks: "What was the revenue for our top 10 enterprise customers last quarter?"
Agent consults ontology: Resolves "revenue" → finance.net_revenue (definition: net sales minus returns, excluding intercompany). Resolves "enterprise customers" → customer.segment = 'Enterprise'. Resolves "last quarter" → fiscal Q4 based on the company's fiscal calendar definition.
Agent consults context graph: Identifies the top 10 enterprise customers by current lifetime_value ranking. Resolves each customer entity across source systems.
Agent consults semantic layer: WrenAI generates the SQL using the correct metric formula, join paths, and filters — incorporating the ontology definitions at every step.
Agent validates: The generated SQL is checked against the ontology constraints. Does it use the correct revenue definition? Does it filter intercompany transactions? Does it use the fiscal calendar?
Agent responds: With the answer, the definition used, and the full query lineage — so the user (and their CFO) can trust the result.

Key Takeaways

Ontology is a business modeling problem, not a database design problem. It cannot be reverse-engineered from schemas. It must be built from business conversations with domain experts who define what concepts mean.
Without ontology, AI agents are confident idiots. They generate syntactically perfect queries against the wrong tables, with the wrong definitions, and present wrong answers with high confidence. Ontology is what transforms pattern matching into reasoning.
Ontology and the semantic layer serve different purposes — both are required. Ontology defines what things mean. The semantic layer defines how to compute them. Skipping either creates a system that is either ungrounded (no ontology) or incomputable (no semantic layer).
The knowledge graph is ontology at runtime. It stores the actual instances of business entities and their cross-system relationships. Entity resolution — knowing that "Acme Corp" in the CRM is the same as "ACME_CORP_001" in billing — is impossible without it.
Start with 20 questions, not 20,000 tables. Ask domain experts what they need to know every week. Those questions reveal the entities, relationships, and definitions that matter. Build the ontology around those questions first, then expand.
Ontology must be machine-readable and consumed at query time. A PDF glossary has zero impact on AI agent accuracy. A live ontology service that agents consult on every query transforms accuracy from 65-85% to 95%+.

MATIH treats ontology as infrastructure, not documentation. Every AI agent query flows through the ontology service, the semantic layer, and the context graph — ensuring that answers are grounded in business-approved definitions, not guesses. Learn more about the architecture or explore how the context graph powers entity resolution at scale.

Further Reading:

10. Data & AI Observability 12. Data Products