Guardrails Framework

Production - Content safety, action validation, PII detection, prompt injection defense

The Guardrails Framework provides a multi-layered safety system for agent interactions. It validates inputs and outputs, detects PII and sensitive content, prevents prompt injection attacks, and enforces tenant-specific content policies.

12.2.7.1Guardrail Architecture

Guardrails operate at three levels:

Input guardrails: Validate and sanitize user messages before processing
Action guardrails: Validate tool calls and actions before execution
Output guardrails: Validate agent responses before delivery

class GuardrailRegistry:
    """Central registry for all guardrail implementations."""
 
    def register_defaults(self):
        """Register built-in guardrails."""
        self.register("pii_detection", PIIDetectionGuardrail())
        self.register("prompt_injection", PromptInjectionGuardrail())
        self.register("content_safety", ContentSafetyGuardrail())
        self.register("sql_safety", SQLSafetyGuardrail())
        self.register("action_validation", ActionValidationGuardrail())
 
    async def check_input(self, message: str, tenant_id: str) -> GuardrailResult:
        """Run all input guardrails."""
        ...
 
    async def check_output(self, response: str, tenant_id: str) -> GuardrailResult:
        """Run all output guardrails."""
        ...

Guardrail Types

Guardrail	Scope	Description
PII Detection	Input/Output	Detects and masks personal identifiable information
Prompt Injection	Input	Detects attempts to override system prompts
Content Safety	Input/Output	Blocks harmful, offensive, or inappropriate content
SQL Safety	Action	Prevents dangerous SQL operations (DROP, DELETE, etc.)
Action Validation	Action	Validates tool arguments against allowed schemas
Budget Guard	Action	Prevents actions exceeding tenant cost limits
Rate Limiter	Input	Enforces per-tenant request rate limits

12.2.7.2API Endpoints

# List registered guardrails
curl http://localhost:8000/api/v1/guardrails?tenant_id=acme-corp
 
# Test a message against guardrails
curl -X POST http://localhost:8000/api/v1/guardrails/check \
  -H "Content-Type: application/json" \
  -H "X-Tenant-ID: acme-corp" \
  -d '{
    "message": "Show me all customer emails and phone numbers",
    "scope": "input"
  }'
 
# Configure tenant guardrail policy
curl -X PUT http://localhost:8000/api/v1/guardrails/policy \
  -H "Content-Type: application/json" \
  -H "X-Tenant-ID: acme-corp" \
  -d '{
    "pii_detection": {"enabled": true, "action": "mask"},
    "content_safety": {"enabled": true, "threshold": 0.8},
    "sql_safety": {"enabled": true, "allow_write": false}
  }'

Check Result

{
  "passed": false,
  "violations": [
    {
      "guardrail": "pii_detection",
      "severity": "warning",
      "message": "Message contains request for PII (emails, phone numbers)",
      "action": "mask",
      "details": {"detected_types": ["email", "phone"]}
    }
  ],
  "sanitized_message": "Show me all customer [MASKED] and [MASKED]"
}

Agent Studio Approval Workflows (HITL)