Hallucination Classifier
Production - Hallucination detection, confidence scoring, feedback-driven improvement
The Hallucination Classifier detects when agent responses contain fabricated, unsupported, or contradictory information. It uses a combination of fact-checking against source data, consistency analysis, and confidence scoring to flag potential hallucinations.
12.2.10.1Classification Architecture
Implemented in data-plane/ai-service/src/agents/hallucination_classifier/:
| Component | Purpose |
|---|---|
HallucinationClassifierService | Core detection engine |
HallucinationRoutes | REST API endpoints |
PostgresHallucinationStore | Persistent classification storage |
Detection Strategies
| Strategy | Description |
|---|---|
| Source Grounding | Verifies claims against source data and SQL results |
| Self-Consistency | Generates multiple responses and checks for contradictions |
| Confidence Calibration | Compares stated confidence with actual accuracy |
| Factual Verification | Cross-references facts with known data |
12.2.10.2API Endpoints
# Classify a response for hallucinations
curl -X POST http://localhost:8000/api/v1/hallucination-classifier/classify \
-H "Content-Type: application/json" \
-H "X-Tenant-ID: acme-corp" \
-d '{
"response": "Revenue grew 25% in Q3, reaching $4.2M total.",
"source_data": {"total_revenue": 3800000, "growth_rate": 0.18},
"context": "User asked about Q3 revenue performance"
}'Classification Result
{
"classification_id": "cls-uuid-123",
"hallucination_detected": true,
"confidence": 0.92,
"claims": [
{
"claim": "Revenue grew 25% in Q3",
"supported": false,
"evidence": "Source data shows 18% growth rate",
"severity": "high"
},
{
"claim": "reaching $4.2M total",
"supported": false,
"evidence": "Source data shows $3.8M total revenue",
"severity": "high"
}
],
"overall_score": 0.15,
"recommendation": "Response contains factual errors. Regenerate with source data verification."
}