Feedback and RLHF

The Feedback integration enables users to rate AI responses, provide corrections, and submit detailed feedback that feeds into a reinforcement learning from human feedback (RLHF) pipeline. Feedback events flow through Kafka to a processing pipeline that generates learning signals for model improvement and quality tracking.

Feedback Architecture

The feedback system is organized into four layers:

Layer	Component	Location
Collection	Multi-source collectors	`src/feedback/collectors/`
Transport	Kafka event streaming	`src/feedback/integration/kafka/`
Processing	Feedback pipeline	`src/feedback/pipeline/`
Learning	RLHF signal generation	`src/feedback/learning/`

Feedback Types

Type	Trigger	Data Collected
`thumbs_up`	User clicks approve	Response ID, session context
`thumbs_down`	User clicks reject	Response ID, session context
`correction`	User edits SQL or response	Original, corrected version, diff
`rating`	User assigns 1-5 stars	Numeric score, optional comment
`implicit_accept`	User uses the generated SQL	Query execution event
`implicit_reject`	User reformulates question	Follow-up message analysis
`escalation`	User requests human help	Session context, frustration signals

Feedback Event Schema

{
  "event_id": "fb-abc123",
  "event_type": "feedback.submitted",
  "tenant_id": "acme-corp",
  "user_id": "user-456",
  "session_id": "sess-xyz789",
  "response_id": "resp-001",
  "feedback_type": "correction",
  "feedback_source": "explicit",
  "data": {
    "original_sql": "SELECT * FROM sales",
    "corrected_sql": "SELECT * FROM sales WHERE region = 'US'",
    "comment": "Need to filter by US region"
  },
  "timestamp": "2025-03-15T10:05:00Z"
}

Collection API

Feedback is collected through REST endpoints in src/feedback/api/:

POST /api/v1/feedback/submit
POST /api/v1/feedback/correction
POST /api/v1/feedback/rating
GET  /api/v1/feedback/summary?session_id=...

Processing Pipeline

The feedback pipeline processes raw events into actionable learning signals:

Deduplication: Filters duplicate feedback events within a time window
Enrichment: Attaches session context, query metadata, and user profile
Classification: Categorizes feedback by type and severity
Signal Generation: Produces learning signals for model fine-tuning
Aggregation: Computes quality scores per agent, query type, and schema

Learning Signals

Learning signals are published to the ai-learning-signals Kafka topic:

{
  "signal_type": "sql_correction",
  "input": "What were sales last month?",
  "expected_output": "SELECT * FROM sales WHERE date >= ...",
  "actual_output": "SELECT * FROM sales",
  "reward": -0.5,
  "context": {
    "schema_tables": ["sales"],
    "tenant_id": "acme-corp"
  }
}

Quality Tracking

Feedback metrics are aggregated and exposed for monitoring:

Metric	Description
`feedback_positive_rate`	Percentage of positive feedback
`feedback_correction_rate`	Percentage of responses requiring correction
`feedback_response_time`	Average time between response and feedback
`feedback_escalation_rate`	Percentage of sessions escalated to human

Insights

The feedback insights module in src/feedback/insights/ generates periodic reports identifying:

Most common correction patterns by schema table
Agent types with lowest satisfaction scores
Query categories with highest escalation rates
Trending feedback themes across tenants

WebSocket Communication Data Lineage