Skip to main contentThe feedback system allows you to continuously improve your AI guardrails by providing feedback on detection results. This human-in-the-loop approach helps fine-tune your detectors, reduce false positives, and enhance overall detection accuracy.
How feedback works
When you provide feedback on detection results, Guardion uses this information to:
- Build a dataset of edge cases specific to your use case
- Immediately apply adjustments to incoming prompts and responses in the Guard API (using similarity and string match)
- Incorporate your feedback during for the policy’s models retraining
This process helps your guardrails become more accurate over time, reducing false positives while maintaining strong protection.
Providing feedback
The feedback interface is integrated directly into the Logs section, making it easy to review and provide input as you investigate detections.
Individual feedback
When reviewing a specific log entry:
- Click on the log to view all detection details
- For any incorrect classification, click Mark as Misclassification
- Your feedback is immediately applied to the relevant policies
Bulk feedback
To efficiently review multiple logs at once:
- Select multiple log entries using the checkboxes
- Choose one of the available actions:
- Mark as Misclassification - For incorrectly classified content
- Confirm as Flagged - To validate correct threat detections (available when Threat filter is enabled)
Note: Feedback is processed individually for each policy that triggered a detection, allowing for precise improvement of specific guardrails.
Reviewing feedback history
You can review all feedback provided for each policy:
The feedback history shows all previous inputs and allows you to modify them if needed. Any changes take effect immediately in the Guard API inside the feedback system.
Feedback Integration with Guard API
The following diagram illustrates how the Guard API processes requests and incorporates feedback throughout the workflow:
Detection workflow
The Guard API follows a structured evaluation process for each request:
- Input validation: The API validates incoming requests and extracts messages for evaluation
- Policy application: Your configured policies determine:
- Which messages to evaluate (user, assistant, or both)
- Detection thresholds (L1-L4)
- Active detection layers
- Detection pipeline: Each message passes through up to three detection layers
- Threshold evaluation: Detection scores are compared against policy thresholds
- Response: The API returns the evaluation result with detailed scoring breakdown
This layered approach ensures both performance and accuracy, with your feedback continuously improving detection quality.
Detection layers
The Guard API uses a cascading detection pipeline that prioritizes speed and accuracy:
Layer 1: Feedback database
- Method: Exact string matching and semantic similarity
- Speed: < 10ms
- High-confidence detection of previously flagged content
Layer 2: Policy-specific models
- Method: Models fine-tuned on your policy’s feedback data
- Speed: < 10ms
- Domain-specific threat detection
Layer 3: Threat Intelligence & Foundation models
- Comprehensive protection against zero-day threats
- Method: Guardion’s state-of-the-art LLMs
- Speed: ~200ms
- Broad coverage and emerging threat detection
Best practices
- Start with L2 threshold and adjust based on your false positive tolerance
- Provide feedback regularly to improve policy-specific model accuracy
- Monitor detection metrics to optimize threshold configuration
- Use policy targeting to evaluate only relevant message types
Next steps