Moderation Model Card
This page describes the model used by the Content Moderation detector.Available Versions
- moderation-v0 — current stable version
Overview
General-purpose moderation across safety categories including hate, self-harm, sexual content, crimes, weapons, privacy, IP, elections, and more.Benchmarks
Coming soon — evaluation across multi-category datasets with per-category metrics.Detailed Information
Coming soon — taxonomy alignment, labeling methodology, and error analysis.Related
- Content Moderation detector — configure policies and thresholds: /moderation