MATIH Platform is in active MVP development. Documentation reflects current implementation status.
12. AI Service
Agent System
Guardrails Framework

Guardrails Framework

Production - Content safety, action validation, PII detection, prompt injection defense

The Guardrails Framework provides a multi-layered safety system for agent interactions. It validates inputs and outputs, detects PII and sensitive content, prevents prompt injection attacks, and enforces tenant-specific content policies.


12.2.7.1Guardrail Architecture

Guardrails operate at three levels:

  1. Input guardrails: Validate and sanitize user messages before processing
  2. Action guardrails: Validate tool calls and actions before execution
  3. Output guardrails: Validate agent responses before delivery
class GuardrailRegistry:
    """Central registry for all guardrail implementations."""
 
    def register_defaults(self):
        """Register built-in guardrails."""
        self.register("pii_detection", PIIDetectionGuardrail())
        self.register("prompt_injection", PromptInjectionGuardrail())
        self.register("content_safety", ContentSafetyGuardrail())
        self.register("sql_safety", SQLSafetyGuardrail())
        self.register("action_validation", ActionValidationGuardrail())
 
    async def check_input(self, message: str, tenant_id: str) -> GuardrailResult:
        """Run all input guardrails."""
        ...
 
    async def check_output(self, response: str, tenant_id: str) -> GuardrailResult:
        """Run all output guardrails."""
        ...

Guardrail Types

GuardrailScopeDescription
PII DetectionInput/OutputDetects and masks personal identifiable information
Prompt InjectionInputDetects attempts to override system prompts
Content SafetyInput/OutputBlocks harmful, offensive, or inappropriate content
SQL SafetyActionPrevents dangerous SQL operations (DROP, DELETE, etc.)
Action ValidationActionValidates tool arguments against allowed schemas
Budget GuardActionPrevents actions exceeding tenant cost limits
Rate LimiterInputEnforces per-tenant request rate limits

12.2.7.2API Endpoints

# List registered guardrails
curl http://localhost:8000/api/v1/guardrails?tenant_id=acme-corp
 
# Test a message against guardrails
curl -X POST http://localhost:8000/api/v1/guardrails/check \
  -H "Content-Type: application/json" \
  -H "X-Tenant-ID: acme-corp" \
  -d '{
    "message": "Show me all customer emails and phone numbers",
    "scope": "input"
  }'
 
# Configure tenant guardrail policy
curl -X PUT http://localhost:8000/api/v1/guardrails/policy \
  -H "Content-Type: application/json" \
  -H "X-Tenant-ID: acme-corp" \
  -d '{
    "pii_detection": {"enabled": true, "action": "mask"},
    "content_safety": {"enabled": true, "threshold": 0.8},
    "sql_safety": {"enabled": true, "allow_write": false}
  }'

Check Result

{
  "passed": false,
  "violations": [
    {
      "guardrail": "pii_detection",
      "severity": "warning",
      "message": "Message contains request for PII (emails, phone numbers)",
      "action": "mask",
      "details": {"detected_types": ["email", "phone"]}
    }
  ],
  "sanitized_message": "Show me all customer [MASKED] and [MASKED]"
}