Feedbacks

The feedback system allows you to continuously improve your AI guardrails by providing feedback on detection results. This human-in-the-loop approach helps fine-tune your detectors, reduce false positives, and enhance overall detection accuracy.

How feedback works

When you provide feedback on detection results, Guardion uses this information to:

Build a dataset of edge cases specific to your use case
Immediately apply adjustments to incoming prompts and responses in the Guard API (using similarity and string match)
Incorporate your feedback during for the policy’s models retraining

This process helps your guardrails become more accurate over time, reducing false positives while maintaining strong protection.

Providing feedback

The feedback interface is integrated directly into the Logs section, making it easy to review and provide input as you investigate detections.

Individual feedback

When reviewing a specific log entry:

Click on the log to view all detection details
For any incorrect classification, click Mark as Misclassification
Your feedback is immediately applied to the relevant policies

Bulk feedback

To efficiently review multiple logs at once:

Select multiple log entries using the checkboxes
Choose one of the available actions:
- Mark as Misclassification - For incorrectly classified content
- Confirm as Flagged - To validate correct threat detections (available when Threat filter is enabled)

Note: Feedback is processed individually for each policy that triggered a detection, allowing for precise improvement of specific guardrails.

Reviewing feedback history

You can review all feedback provided for each policy:

The feedback history shows all previous inputs and allows you to modify them if needed. Any changes take effect immediately in the Guard API inside the feedback system.

Feedback Integration with Guard API

The following diagram illustrates how the Guard API processes requests and incorporates feedback throughout the workflow:

Detection workflow

The Guard API follows a structured evaluation process for each request:

Policy apply engine: Each request is evaluated against your configured policies, which specify both the target (user or assistant messages) and the detection thresholds.
Multi-layered detection pipeline: The system processes each request through a cascading detection pipeline:
- Feedback database check: First, we check against your policy’s feedback database using exact string matching and semantic similarity algorithms
- Policy-specific models: If not flagged by feedback, the request is evaluated by models specifically trained on your policy’s feedback data
- Base detection models: Finally, Guardion’s foundation detection models evaluates the request to provide comprehensive protection
Threshold evaluation: Detection confidence scores are compared against your configured thresholds (L1-L4)
- If the confidence score exceeds the threshold, the request is flagged
- The API returns the flagging decision to the client with breakdown data

This layered approach ensures both performance and accuracy, with your feedback continuously improving detection quality.

Best practices

Provide feedback regularly to continuously improve detection accuracy
Focus on edge cases that are specific to your use case
Review feedback history periodically to identify patterns in misclassifications
Use bulk actions for efficient review of similar detection types

Getting Started

Platform

Detectors

Integrations

Guardion API endpoints

How feedback works

Providing feedback

Individual feedback

Bulk feedback

Reviewing feedback history

Feedback Integration with Guard API

Detection workflow

Best practices

Getting Started

Platform

Detectors

Integrations

Guardion API endpoints

​How feedback works

​Providing feedback

​Individual feedback

​Bulk feedback

​Reviewing feedback history

​Feedback Integration with Guard API

​Detection workflow

​Best practices

How feedback works

Providing feedback

Individual feedback

Bulk feedback

Reviewing feedback history

Feedback Integration with Guard API

Detection workflow

Best practices