Trace Correlation
Trace correlation links distributed traces with logs and metrics to provide a complete picture of request processing. MATIH uses trace IDs as the common identifier across all three observability pillars, enabling seamless navigation from a metric spike to the causing trace to the relevant log entries.
Correlation Architecture
Metric Alert --> Exemplar Trace ID --> Trace View --> Log Lines
|
Span Attributes --> Metric LabelsTrace-to-Log Correlation
Every log line includes the trace ID and span ID for correlation:
Python (structlog)
import structlog
from opentelemetry import trace
def add_trace_context(logger, method_name, event_dict):
span = trace.get_current_span()
if span.is_recording():
ctx = span.get_span_context()
event_dict["trace_id"] = format(ctx.trace_id, "032x")
event_dict["span_id"] = format(ctx.span_id, "016x")
return event_dict
structlog.configure(processors=[
add_trace_context,
structlog.dev.ConsoleRenderer(),
])Java (Spring Boot)
Trace context is automatically included in Spring Boot logs via the Micrometer Tracing integration:
2025-06-15 10:30:00 [trace_id=abc123, span_id=def456] INFO c.m.i.TenantService - Provisioning tenant acmeLog-to-Trace Linking in Grafana
Loki is configured with derived fields that extract trace IDs from log lines:
datasources:
- name: Loki
type: loki
jsonData:
derivedFields:
- datasourceUid: tempo
matcherRegex: "trace_id=(\\w+)"
name: TraceID
url: "$${__value.raw}"Clicking a trace ID in a log line opens the corresponding trace in the Tempo data source.
Metric-to-Trace Linking
Prometheus histogram metrics include exemplars that contain trace IDs:
from prometheus_client import Histogram
request_duration = Histogram(
"matih_http_request_duration_seconds",
"Request duration",
["method", "endpoint"],
)
# Record with exemplar
request_duration.labels(method="POST", endpoint="/search").observe(
0.45,
exemplar={"traceID": current_trace_id},
)In Grafana, enabling exemplars on a histogram panel displays individual trace links on the histogram bars.
Tenant Context Propagation
The tenant ID is propagated alongside trace context:
| Header | Purpose |
|---|---|
traceparent | W3C trace context (trace_id, span_id) |
X-Tenant-Id | Tenant identifier |
X-Request-Id | Request correlation ID |
These headers are set by the API gateway and propagated through all downstream services.
Correlation Workflow
- Alert fires -- Prometheus alert triggers on high error rate
- View dashboard -- Grafana dashboard shows the error spike with exemplars
- Click exemplar -- Navigate to the specific trace in Tempo
- View trace -- See the full span tree with error annotations
- Jump to logs -- Click "Logs for this span" to see related log entries in Loki
- Root cause -- Identify the failing service and specific error from logs