Skip to main contentModeration Model Card
This page describes the model used by the Content Moderation detector.
Available Versions
- moderation-v0 — current stable version
Overview
General-purpose moderation across safety categories including hate, self-harm, sexual content, crimes, weapons, privacy, IP, elections, and more.
Benchmarks
Coming soon — evaluation across multi-category datasets with per-category metrics.
Coming soon — taxonomy alignment, labeling methodology, and error analysis.
- Content Moderation detector — configure policies and thresholds: /moderation