Detector

Detect and filter unsafe content across multiple safety categories in both user inputs and assistant outputs. Use this detector to enforce community standards and regulatory policies.

What it detects

Hate and harassment
Self-harm and dangerous activities
Sexual and adult content
Criminal activity and weapons
Privacy, IP, elections, and safety-sensitive topics

Available models (versions)

moderation-v0 — general-purpose moderation across core categories

Using the Moderation detector

// 1) Create or update a Content Moderation policy (check user + assistant)
await fetch("https://api.guardion.ai/v1/policies", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_API_KEY"
  },
  body: JSON.stringify({
    id: "content-moderation",
    definition: "Classify and filter unsafe content",
    target: "user",
    detector: {
      model: "moderation",
      expected: "block",
      threshold: 0.9,
      reference: ["user", "assistant"]
    }
  })
});

// 2) Evaluate using that policy
const response = await fetch("https://api.guardion.ai/v1/guard", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_API_KEY"
  },
  body: JSON.stringify({
    messages: [{ role: "user", content: "..." }],
    override_enabled_policies: ["content-moderation"]
  })
});

Threshold levels

L1 (0.9): Confident
L2 (0.8): Very Likely
L3 (0.7): Likely
L4 (0.6): Less Likely

Notes

For stricter environments, use higher thresholds on sensitive categories.
Combine with Injection and PII detectors for comprehensive runtime safety.

Moderation Model Card (v0): /moderation-model

Getting Started

Platform

Detectors

Guardion Models

Integrations

Guardion API endpoints

Content Moderation

Detector

What it detects

Available models (versions)

Categories

Using the Moderation detector

Threshold levels

Notes

Getting Started

Platform

Detectors

Guardion Models

Integrations

Guardion API endpoints

​Detector

​What it detects

​Available models (versions)

​Categories

​Using the Moderation detector

​Threshold levels

​Notes

​Related

Detector

What it detects

Available models (versions)

Categories

Using the Moderation detector

Threshold levels

Notes

Related