
How policies work
A policy describes what to detect, where to look, and how to respond:- Detector model: The engine used to detect a risk
- Target: Which part(s) of the conversation to inspect, e.g. Assistant or User
- Sensibility: The minimum confidence required to flag
- Expected behavior (coming soon): Defines the action taken when a detector is triggered (e.g., block or alert)
- Override response: Optional default message to return on violation

Detector types
Guardion supports several types of detectors, each designed to identify specific risks or policy violations in AI interactions. You can mix and match detector types within your policies to cover a wide range of threats.Supported types and models:
Prompt Defense- modern-guard-v1: Fast, general-purpose prompt security and jailbreak detection
- modern-guard-v1.5: Advanced agentic prompt security and jailbreak detection
- moderation-v0: Safety moderation for harmful content
Targets
Choose where the policy evaluates:- user
- assistant
- system
- developer
- context
Assign policies to applications
Policies are assigned per application. The relationship is:- One policy can be used by many applications
- One application can enable one policies
- Go to the application
- Click Assign policy
- Select the policy and confirm
Using policies via API
When calling the Guard API, Guardion will evaluate the policies assigned to the specified application. Provide your application ID in the request body.- See the full schema in Guard API
- Return shape includes
flagged
,breakdown
(per-policy results), and optionalcorrection
Reviews and datasets (optional)
Use policy-focused reviews to iterate on performance, and build a dataset from your feedback to continuously improve detection.