Skip to main content

Detector

The Injection detector protects your agents and applications from prompt injections, jailbreaks, context hijacking, and data exfiltration attempts. It is powered by the ModernGuard model family and is designed for low-latency, multilingual runtime use.

What it detects

  • Prompt injections and jailbreaks (e.g., DAN, Goodside)
  • Context hijacking and instruction overrides
  • Evasion and obfuscation (e.g., Unicode/ANSI/ASCII tricks)
  • Data exfiltration and leakage attempts
  • Code/command injection patterns (shell, SQL, tool abuse)

Available models (versions)

See the detailed model card in ModernGuard for architecture and benchmarks.

Categories

The detector classifies threats across multiple categories you can target in policies:
  • DIRECT_OVERRIDE
  • OBFUSCATION
  • CONTEXT_HIJACK
  • DATA_EXFILTRATION
  • ROLE_IMPERSONATION
  • MULTISTEP_INSTRUCTION_HIDING
  • TASK_MISUSE

Using the Injection detector

You configure the detector via guard policies. First create a policy, then evaluate with that policy by overriding enabled policies.
// 1) Create or update a Prompt Defense policy
await fetch("https://api.guardion.ai/v1/policies", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_API_KEY"
  },
  body: JSON.stringify({
    id: "prompt-defense",
    definition: "Prevent prompt injections and jailbreaks",
    target: "user",
    detector: {
      model: "modern-guard",
      expected: "block",
      threshold: 0.9
    }
  })
});

// 2) Evaluate using that policy
const response = await fetch("https://api.guardion.ai/v1/guard", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_API_KEY"
  },
  body: JSON.stringify({
    messages: [{ role: "user", content: "..." }],
    override_enabled_policies: ["prompt-defense"]
  })
});

Threshold levels

  • L1 (0.9): Confident
  • L2 (0.8): Very Likely
  • L3 (0.7): Likely
  • L4 (0.6): Less Likely
Adjust thresholds per use case to balance false positives and coverage.
  • ModernGuard — model card, benchmarks, and performance details
  • Policies — how to compose and deploy guardrails