Feedback Scoring
The FeedbackScoringService provides multi-factor quality assessment for agent traces using both explicit user feedback and implicit behavioral signals. It computes composite scores for correctness, completeness, and efficiency to drive reinforcement learning, pattern quality ranking, and performance monitoring.
Overview
Feedback scoring closes the loop between agent execution and continuous improvement. By capturing both explicit signals (user ratings, corrections) and implicit signals (retries, downstream usage), the service builds a comprehensive quality picture for each trace.
Source: data-plane/ai-service/src/context_graph/services/feedback_scoring_service.py
Feedback Types
| Type | Description | Example |
|---|---|---|
EXPLICIT | Direct user feedback | Thumbs up/down rating |
IMPLICIT | Inferred from user behavior | User retried the query |
COMPUTED | Calculated from metrics | Efficiency score from step count |
AUTOMATED | System-generated | Pattern match confirmation |
Feedback Sources
| Source | Signal | Score Range |
|---|---|---|
USER_RATING | User gave a rating | -1 to 1 |
USER_CORRECTION | User corrected the result | -0.5 to 0 |
USER_RETRY | User retried the task | -0.3 to 0 |
USER_ABANDON | User abandoned the task | -1 to -0.5 |
DOWNSTREAM_USE | Result was used downstream | 0.5 to 1 |
OUTCOME_SUCCESS | Task achieved its goal | 0.5 to 1 |
OUTCOME_FAILURE | Task failed | -1 to -0.5 |
PATTERN_MATCH | Matched an expected pattern | 0 to 0.5 |
EFFICIENCY | Efficiency computation | 0 to 1 |
Composite Trace Score
Each trace receives a composite score with three dimensions:
| Dimension | Description | How Computed |
|---|---|---|
| Correctness | Did the trace achieve its goal? | Weighted average of outcome and user feedback signals |
| Completeness | Did it follow the expected pattern? | Pattern match score from pattern mining |
| Efficiency | Was it optimal? | Inverse of step count relative to pattern average |
Feedback Records
record = FeedbackRecord(
trace_urn="urn:matih:trace:acme:trace-123",
tenant_id="acme",
feedback_type=FeedbackType.EXPLICIT,
feedback_source=FeedbackSource.USER_RATING,
score=0.8,
confidence=1.0,
actor_urn="urn:matih:user:acme:analyst-1",
comment="Accurate results, good visualization",
)Score Aggregation
When multiple feedback signals exist for a trace, they are aggregated using confidence-weighted averaging:
aggregate_score = sum(score_i * confidence_i) / sum(confidence_i)More recent feedback is weighted higher using a temporal decay function.
Use Cases
| Use Case | Description |
|---|---|
| Reinforcement learning | Feedback scores drive agent improvement signals |
| Pattern quality ranking | Higher-scored patterns are preferred for future routing |
| Agent performance dashboards | Track agent quality trends over time |
| User satisfaction tracking | Monitor explicit user satisfaction per tenant |
| A/B testing | Compare feedback scores between agent configurations |
Integration Points
| Component | Integration |
|---|---|
| Agent Orchestrator | Receives outcome signals after execution |
| Pattern Mining | Pattern match scores from discovered patterns |
| Decision Ranking | Outcome confidence from feedback signals |
| Analytics API | Feedback data accessible via analytics endpoints |