Feedback Scoring

The FeedbackScoringService provides multi-factor quality assessment for agent traces using both explicit user feedback and implicit behavioral signals. It computes composite scores for correctness, completeness, and efficiency to drive reinforcement learning, pattern quality ranking, and performance monitoring.

Overview

Feedback scoring closes the loop between agent execution and continuous improvement. By capturing both explicit signals (user ratings, corrections) and implicit signals (retries, downstream usage), the service builds a comprehensive quality picture for each trace.

Source: data-plane/ai-service/src/context_graph/services/feedback_scoring_service.py

Feedback Types

Type	Description	Example
`EXPLICIT`	Direct user feedback	Thumbs up/down rating
`IMPLICIT`	Inferred from user behavior	User retried the query
`COMPUTED`	Calculated from metrics	Efficiency score from step count
`AUTOMATED`	System-generated	Pattern match confirmation

Feedback Sources

Source	Signal	Score Range
`USER_RATING`	User gave a rating	-1 to 1
`USER_CORRECTION`	User corrected the result	-0.5 to 0
`USER_RETRY`	User retried the task	-0.3 to 0
`USER_ABANDON`	User abandoned the task	-1 to -0.5
`DOWNSTREAM_USE`	Result was used downstream	0.5 to 1
`OUTCOME_SUCCESS`	Task achieved its goal	0.5 to 1
`OUTCOME_FAILURE`	Task failed	-1 to -0.5
`PATTERN_MATCH`	Matched an expected pattern	0 to 0.5
`EFFICIENCY`	Efficiency computation	0 to 1

Composite Trace Score

Each trace receives a composite score with three dimensions:

Dimension	Description	How Computed
Correctness	Did the trace achieve its goal?	Weighted average of outcome and user feedback signals
Completeness	Did it follow the expected pattern?	Pattern match score from pattern mining
Efficiency	Was it optimal?	Inverse of step count relative to pattern average

Feedback Records

record = FeedbackRecord(
    trace_urn="urn:matih:trace:acme:trace-123",
    tenant_id="acme",
    feedback_type=FeedbackType.EXPLICIT,
    feedback_source=FeedbackSource.USER_RATING,
    score=0.8,
    confidence=1.0,
    actor_urn="urn:matih:user:acme:analyst-1",
    comment="Accurate results, good visualization",
)

Score Aggregation

When multiple feedback signals exist for a trace, they are aggregated using confidence-weighted averaging:

aggregate_score = sum(score_i * confidence_i) / sum(confidence_i)

More recent feedback is weighted higher using a temporal decay function.

Use Cases

Use Case	Description
Reinforcement learning	Feedback scores drive agent improvement signals
Pattern quality ranking	Higher-scored patterns are preferred for future routing
Agent performance dashboards	Track agent quality trends over time
User satisfaction tracking	Monitor explicit user satisfaction per tenant
A/B testing	Compare feedback scores between agent configurations

Integration Points

Component	Integration
Agent Orchestrator	Receives outcome signals after execution
Pattern Mining	Pattern match scores from discovered patterns
Decision Ranking	Outcome confidence from feedback signals
Analytics API	Feedback data accessible via analytics endpoints

Decision Ranking Memory System Overview