Improve your AI guardrails through our feedback loop system.
The feedback system allows you to continuously improve your AI guardrails by providing feedback on detection results. This human-in-the-loop approach helps fine-tune your detectors, reduce false positives, and enhance overall detection accuracy.
You can review all feedback provided for each policy:
The feedback history shows all previous inputs and allows you to modify them if needed. Any changes take effect immediately in the Guard API inside the feedback system.
The Guard API follows a structured evaluation process for each request:
Policy apply engine: Each request is evaluated against your configured policies, which specify both the target (user or assistant messages) and the detection thresholds.
Multi-layered detection pipeline: The system processes each request through a cascading detection pipeline:
Feedback database check: First, we check against your policy’s feedback database using exact string matching and semantic similarity algorithms
Policy-specific models: If not flagged by feedback, the request is evaluated by models specifically trained on your policy’s feedback data
Base detection models: Finally, Guardion’s foundation detection models evaluates the request to provide comprehensive protection
Threshold evaluation: Detection confidence scores are compared against your configured thresholds (L1-L4)
If the confidence score exceeds the threshold, the request is flagged
The API returns the flagging decision to the client with breakdown data
This layered approach ensures both performance and accuracy, with your feedback continuously improving detection quality.