Metrics Bridge

The MetricsBridge links existing observability metrics (OpenTelemetry spans, Prometheus counters, LLMOps cost records) to context graph thinking traces. It maps LLM call spans, latency measurements, and cost data to the active thinking trace for the current session.

Overview

The metrics bridge ensures that observability data and context graph data are correlated. When an LLM call generates a span in the tracing system, the metrics bridge also records it as an API call in the thinking trace, enabling unified analysis across both systems.

Source: data-plane/ai-service/src/context_graph/integration/metrics_bridge.py

How It Works

+------------------+     +---------------+     +------------------+
| Observability    | --> | Metrics       | --> | Thinking Trace   |
| (spans, metrics) |     | Bridge        |     | (API call record)|
+------------------+     +---------------+     +------------------+

When a thinking trace is started, the bridge registers the trace ID for the session
As observability spans complete, the bridge maps them to API call records
When the thinking trace completes, the bridge unregisters the session

Session Registration

bridge = MetricsBridge(thinking_service=service)
 
# Register when a trace starts
bridge.register_trace(session_id="sess-123", trace_id="trace-456")
 
# Unregister when a trace completes
bridge.unregister_trace(session_id="sess-123")

Span Mapping

When an observability span ends, the bridge captures it:

await bridge.on_span_end(
    span_name="llm_call",
    span_kind="client",
    session_id="sess-123",
    duration_ms=450.0,
    attributes={
        "model_id": "gpt-4",
        "input_tokens": 1500,
        "output_tokens": 500,
        "endpoint": "https://api.openai.com/v1/chat/completions",
    },
)

This creates an APICallRecord linked to the active thinking trace for the session.

Mapped Span Types

Span Name	API Type	What Is Captured
`llm_call`	`llm`	Model ID, tokens, cost, latency
`tool_execution`	`tool`	Tool name, input/output, duration
`database_query`	`database`	Query type, result count, duration
`vector_search`	`vector`	Namespace, top-k, result count
`api_call`	`external`	Endpoint, method, status code

Cost Correlation

The bridge maps LLMOps cost records to thinking traces:

await bridge.on_cost_record(
    session_id="sess-123",
    model_id="gpt-4",
    input_tokens=1500,
    output_tokens=500,
    cost_usd=0.045,
)

Integration with Existing Observability

The metrics bridge does not replace existing observability infrastructure. It adds a correlation layer:

System	What It Tracks	Metrics Bridge Adds
Prometheus	Request rates, latencies	Correlation to trace IDs
OpenTelemetry	Distributed spans	API call records in thinking traces
LLMOps	Token usage, costs	Cost attribution to agent reasoning paths

Orchestrator Hooks Security Overview