MATIH Platform is in active MVP development. Documentation reflects current implementation status.
19. Observability & Operations
Log Querying (LogQL)

Log Querying (LogQL)

LogQL is Loki's query language, inspired by PromQL. It supports label-based filtering, text pattern matching, JSON parsing, and metric aggregations over log data. MATIH operators use LogQL through Grafana's Explore view to investigate incidents, debug issues, and analyze patterns.


Query Types

TypeDescriptionExample
Log queriesReturn log linesLabel filter + text match
Metric queriesReturn computed valuesRate computation over matching logs

Basic Queries

Filter by Service

{service="ai-service"}

Filter by Log Level

{service="ai-service", level="error"}

Text Search

{service="ai-service"} |= "timeout"

Regex Match

{service="ai-service"} |~ "tenant_id=acme.*error"

Exclude Pattern

{service="ai-service"} != "health_check"

JSON Parsing

Parse structured JSON logs and filter on extracted fields:

{service="ai-service"} | json | tenant_id="acme" | duration_ms > 1000

Extract Specific Fields

{service="ai-service"} | json | line_format "{{.event}} [{{.tenant_id}}] {{.duration_ms}}ms"

Metric Queries

Error Rate per Service

sum(rate({level="error"}[5m])) by (service)

Request Count by Tenant

sum(count_over_time({service="ai-service"} | json | event="request_completed" [1h])) by (tenant_id)

p95 Duration from Logs

quantile_over_time(0.95, {service="ai-service"} | json | unwrap duration_ms [5m]) by (service)

Common Investigation Queries

All Errors for a Tenant

{service="ai-service"} | json | level="error" | tenant_id="acme"

Trace a Specific Request

{service=~".*"} |= "trace_id=abc123"

Slow LLM Calls

{service="ai-service"} | json | event="llm_call_completed" | duration_ms > 5000

Provisioning Failures

{service="tenant-service"} | json | event=~"provisioning.*" | level="error"

Recent Deployments

{namespace="matih-control-plane"} |= "deployment"

Performance Tips

  • Always start with a label matcher (e.g., a service selector like service="ai-service") -- never query all logs
  • Use |= (contains) rather than |~ (regex) when possible -- it is faster
  • Limit time ranges to the minimum needed for investigation
  • Use | json only when you need to filter or extract JSON fields
  • Avoid queries that scan more than 24 hours of data without label filters