Skip to main content

Moderation Model Card

This page describes the model used by the Content Moderation detector.

Available Versions

  • moderation-v0 — current stable version

Overview

General-purpose moderation across safety categories including hate, self-harm, sexual content, crimes, weapons, privacy, IP, elections, and more.

Benchmarks

Coming soon — evaluation across multi-category datasets with per-category metrics.

Detailed Information

Coming soon — taxonomy alignment, labeling methodology, and error analysis.
  • Content Moderation detector — configure policies and thresholds: /moderation