> ## Documentation Index
> Fetch the complete documentation index at: https://docs.guardion.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Content Moderation

> Classify and filter unsafe or policy-violating content

Detect and filter unsafe content across multiple safety categories in both user inputs and assistant outputs. Use this detector to enforce community standards and regulatory policies.

## What it detects

* Hate and harassment
* Self-harm and dangerous activities
* Sexual and adult content
* Criminal activity and weapons
* Privacy, IP, elections, and safety-sensitive topics

## Available models (versions)

* moderation-v0 — general-purpose moderation across core categories

## Detection Categories

The current model `moderation-v0` provides comprehensive coverage across safety-sensitive categories, ensuring that your AI application remains compliant and secure.

| Category                | Description                                                                                                  |
| :---------------------- | :----------------------------------------------------------------------------------------------------------- |
| **HATE & HARASSMENT**   | Content that promotes violence, incites hatred, or targets individuals/groups based on protected attributes. |
| **SEXUAL\_CONTENT**     | Explicit sexual descriptions, adult content, and non-consensual sexual content.                              |
| **SELF\_HARM**          | Content that encourages, provides instructions for, or promotes self-injury or suicide.                      |
| **CRIMES & WEAPONS**    | Instructions for illegal acts, criminal activities, or the creation and use of weapons.                      |
| **SPECIALIZED\_ADVICE** | Unlicensed or dangerous advice in sensitive fields such as medical, legal, or financial services.            |
| **PRIVACY & IP**        | Attempts to solicit private information or content that violates Intellectual Property rights.               |
| **ELECTIONS**           | Highly sensitive political content, election misinformation, or prohibited political campaigning.            |
| **DEFAMATION**          | Content intended to damage the reputation of individuals or organizations through false statements.          |

## Using the Moderation detector

```js theme={null}
// 1) Create or update a Content Moderation policy (check user + assistant)
await fetch("https://api.guardion.ai/v1/policies", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_API_KEY"
  },
  body: JSON.stringify({
    id: "content-moderation",
    definition: "Classify and filter unsafe content",
    threshold: 0.9, // L1 (Confident). Use 0.8 for L2, 0.7 for L3, 0.6 for L4
    detector: {
      model: "moderation-v0",
      target: "user",
    }
  })
});

// 2) Evaluate using that policy
const response = await fetch("https://api.guardion.ai/v1/guard", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_API_KEY"
  },
  body: JSON.stringify({
    messages: [{ role: "user", content: "..." }],
    policy: "content-moderation"
  })
});
```

### Threshold levels

* L1 (0.9): Confident
* L2 (0.8): Very Likely
* L3 (0.7): Likely
* L4 (0.6): Less Likely

## Notes

* For stricter environments, use higher thresholds on sensitive categories.
* Combine with Injection and PII detectors for comprehensive runtime safety.

## Related

* Moderation Model Card (v0): [/moderation-model](/moderation-model)