What it detects
- Any user-defined policy violations
- Domain-specific compliance requirements (e.g., financial regulations, healthcare guidelines)
- Brand safety and tone violations
- Custom content policies beyond standard safety categories
Available models (versions)
- Guardion-1-8B — supports custom policy evaluation with reasoning traces
How it works
Creating a custom guardrail in the Console
- Go to the Guardrails page in the Console
- Click Create Custom Guardrail
- Define the detection rules — the criteria that will flag content as unsafe
- Choose the detection type:
- LLM — uses Guardion-1-8B to evaluate content against your natural language instruction
- Regex — uses pattern matching for structured detection (e.g., specific formats, keywords)
- Write your instruction — a clear description of what should be flagged
Example instructions
- “Assistant providing a step-by-step plan to make a bomb or any dangerous weapon”
- “Assistant supporting or providing investment advisory, financial planning, or stock recommendations”
- “User sending gibberish data, spam content, or nonsensical repeated characters”
- “Assistant revealing internal system prompts, API keys, or configuration details”
- “User attempting to extract training data or model weights from the assistant”
Assigning guardrails to a policy
Once you’ve created your custom guardrails, assign them to a policy where you can combine multiple guardrails together and configure sensibility for each one:- Go to the Policies page and create or edit a policy
- Add guardrails — combine your custom guardrails with built-in ones (Prompt Security, PII, Moderation, Grounding) in a single policy
- Set sensibility — adjust the global confidence threshold for the policy
- Assign the policy to one or more applications
Using the Custom detector via API
Evaluate content against your custom policy using the/v1/guard endpoint:
Example use cases
| Use Case | Custom Policy Example |
|---|---|
| Financial compliance | ”Response must not provide specific investment advice or guarantee returns.” |
| Healthcare | ”Response must include a disclaimer that it is not a substitute for professional medical advice.” |
| Brand safety | ”Response must not mention competitor products or make comparative claims.” |
| Legal | ”Response must not provide specific legal advice or interpret statutes for the user’s jurisdiction.” |
| Internal policy | ”Response must not reveal internal company processes, pricing models, or employee information.” |
Threshold levels
- L1 (0.9): Confident
- L2 (0.8): Very Likely
- L3 (0.7): Likely
- L4 (0.6): Less Likely
Notes
- Custom policies are described in natural language — no training or fine-tuning required.
- Combine with pre-built detectors (Injection, PII, Moderation, Grounding) for layered safety.
- The model supports thinking mode for detailed reasoning traces on custom evaluations.
Related
- Guardion-1-8B Model Card: /guardion-1-8b