Detect and filter unsafe content across multiple safety categories in both user inputs and assistant outputs. Use this detector to enforce community standards and regulatory policies.Documentation Index
Fetch the complete documentation index at: https://docs.guardion.ai/llms.txt
Use this file to discover all available pages before exploring further.
What it detects
- Hate and harassment
- Self-harm and dangerous activities
- Sexual and adult content
- Criminal activity and weapons
- Privacy, IP, elections, and safety-sensitive topics
Available models (versions)
- moderation-v0 — general-purpose moderation across core categories
Detection Categories
The current modelmoderation-v0 provides comprehensive coverage across safety-sensitive categories, ensuring that your AI application remains compliant and secure.
| Category | Description |
|---|---|
| HATE & HARASSMENT | Content that promotes violence, incites hatred, or targets individuals/groups based on protected attributes. |
| SEXUAL_CONTENT | Explicit sexual descriptions, adult content, and non-consensual sexual content. |
| SELF_HARM | Content that encourages, provides instructions for, or promotes self-injury or suicide. |
| CRIMES & WEAPONS | Instructions for illegal acts, criminal activities, or the creation and use of weapons. |
| SPECIALIZED_ADVICE | Unlicensed or dangerous advice in sensitive fields such as medical, legal, or financial services. |
| PRIVACY & IP | Attempts to solicit private information or content that violates Intellectual Property rights. |
| ELECTIONS | Highly sensitive political content, election misinformation, or prohibited political campaigning. |
| DEFAMATION | Content intended to damage the reputation of individuals or organizations through false statements. |
Using the Moderation detector
Threshold levels
- L1 (0.9): Confident
- L2 (0.8): Very Likely
- L3 (0.7): Likely
- L4 (0.6): Less Likely
Notes
- For stricter environments, use higher thresholds on sensitive categories.
- Combine with Injection and PII detectors for comprehensive runtime safety.
Related
- Moderation Model Card (v0): /moderation-model