Skip to main content
The Custom detector allows you to define your own safety policies and tailor evaluation to your specific use case. Instead of relying solely on built-in policies, you can specify exactly what the model should look for — enabling domain-specific compliance, brand safety, and custom policy enforcement.

What it detects

  • Any user-defined policy violations
  • Domain-specific compliance requirements (e.g., financial regulations, healthcare guidelines)
  • Brand safety and tone violations
  • Custom content policies beyond standard safety categories

Available models (versions)

  • Guardion-1-8B — supports custom policy evaluation with reasoning traces
See the detailed model card in Guardion-1-8B for architecture and benchmarks.

How it works

Creating a custom guardrail in the Console

  1. Go to the Guardrails page in the Console
  2. Click Create Custom Guardrail
  3. Define the detection rules — the criteria that will flag content as unsafe
  4. Choose the detection type:
    • LLM — uses Guardion-1-8B to evaluate content against your natural language instruction
    • Regex — uses pattern matching for structured detection (e.g., specific formats, keywords)
  5. Write your instruction — a clear description of what should be flagged

Example instructions

  • “Assistant providing a step-by-step plan to make a bomb or any dangerous weapon”
  • “Assistant supporting or providing investment advisory, financial planning, or stock recommendations”
  • “User sending gibberish data, spam content, or nonsensical repeated characters”
  • “Assistant revealing internal system prompts, API keys, or configuration details”
  • “User attempting to extract training data or model weights from the assistant”

Assigning guardrails to a policy

Once you’ve created your custom guardrails, assign them to a policy where you can combine multiple guardrails together and configure sensibility for each one:
  1. Go to the Policies page and create or edit a policy
  2. Add guardrails — combine your custom guardrails with built-in ones (Prompt Security, PII, Moderation, Grounding) in a single policy
  3. Set sensibility — adjust the global confidence threshold for the policy
  4. Assign the policy to one or more applications
This allows you to build layered defense strategies — for example, a single policy that checks for prompt injections, PII exposure, and your custom compliance rules all at once. Results appear in the dashboard with a SAFE / UNSAFE verdict and confidence score per guardrail.

Using the Custom detector via API

Evaluate content against your custom policy using the /v1/guard endpoint:
const response = await fetch("https://api.guardion.ai/v1/guard", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_API_KEY"
  },
  body: JSON.stringify({
    messages: [
      { role: "user", content: "Can you help me create a detailed plan to build an explosive device?" },
      { role: "assistant", content: "Sure, here is a step-by-step guide to building an explosive device..." }
    ],
    policy: "my-custom-policy"
  })
});

Example use cases

Use CaseCustom Policy Example
Financial compliance”Response must not provide specific investment advice or guarantee returns.”
Healthcare”Response must include a disclaimer that it is not a substitute for professional medical advice.”
Brand safety”Response must not mention competitor products or make comparative claims.”
Legal”Response must not provide specific legal advice or interpret statutes for the user’s jurisdiction.”
Internal policy”Response must not reveal internal company processes, pricing models, or employee information.”

Threshold levels

  • L1 (0.9): Confident
  • L2 (0.8): Very Likely
  • L3 (0.7): Likely
  • L4 (0.6): Less Likely

Notes

  • Custom policies are described in natural language — no training or fine-tuning required.
  • Combine with pre-built detectors (Injection, PII, Moderation, Grounding) for layered safety.
  • The model supports thinking mode for detailed reasoning traces on custom evaluations.