Hallucination Classifier

Production - Hallucination detection, confidence scoring, feedback-driven improvement

The Hallucination Classifier detects when agent responses contain fabricated, unsupported, or contradictory information. It uses a combination of fact-checking against source data, consistency analysis, and confidence scoring to flag potential hallucinations.

12.2.10.1Classification Architecture

Implemented in data-plane/ai-service/src/agents/hallucination_classifier/:

Component	Purpose
`HallucinationClassifierService`	Core detection engine
`HallucinationRoutes`	REST API endpoints
`PostgresHallucinationStore`	Persistent classification storage

Detection Strategies

Strategy	Description
Source Grounding	Verifies claims against source data and SQL results
Self-Consistency	Generates multiple responses and checks for contradictions
Confidence Calibration	Compares stated confidence with actual accuracy
Factual Verification	Cross-references facts with known data

12.2.10.2API Endpoints

# Classify a response for hallucinations
curl -X POST http://localhost:8000/api/v1/hallucination-classifier/classify \
  -H "Content-Type: application/json" \
  -H "X-Tenant-ID: acme-corp" \
  -d '{
    "response": "Revenue grew 25% in Q3, reaching $4.2M total.",
    "source_data": {"total_revenue": 3800000, "growth_rate": 0.18},
    "context": "User asked about Q3 revenue performance"
  }'

Classification Result

{
  "classification_id": "cls-uuid-123",
  "hallucination_detected": true,
  "confidence": 0.92,
  "claims": [
    {
      "claim": "Revenue grew 25% in Q3",
      "supported": false,
      "evidence": "Source data shows 18% growth rate",
      "severity": "high"
    },
    {
      "claim": "reaching $4.2M total",
      "supported": false,
      "evidence": "Source data shows $3.8M total revenue",
      "severity": "high"
    }
  ],
  "overall_score": 0.15,
  "recommendation": "Response contains factual errors. Regenerate with source data verification."
}

Drift Detection Evaluation Runner