Skip to main content
The Injection detector protects your agents and applications from prompt injections, jailbreaks, context hijacking, and data exfiltration attempts. It is powered by the ModernGuard model family and is designed for low-latency, multilingual runtime use.

What it detects

  • Prompt injections and jailbreaks (e.g., DAN, Goodside)
  • Context hijacking and instruction overrides
  • Evasion and obfuscation (e.g., Unicode/ANSI/ASCII tricks)
  • Data exfiltration and leakage attempts
  • Code/command injection patterns (shell, SQL, tool abuse)

Available models (versions)

See the detailed model card in ModernGuard for architecture and benchmarks.

Categories

The current model modern-guard-v1.5 is trained to identify complex adversarial maneuvers. The detector classifies threats across several specialized categories:
CategoryDescription
DIRECT_OVERRIDEExplicit attempts to ignore previous system instructions (e.g., “Ignore all previous directions”).
OBFUSCATIONUsing base64 encoding, leetspeak, or translated text to hide malicious intent from standard filters.
CONTEXT_HIJACKAttempts to redirect the model’s focus away from its intended task toward a new, unauthorized context.
DATA_EXFILTRATIONInstructions designed to make the model reveal its system prompt, training data, or session secrets.
ROLE_IMPERSONATIONForcing the model to act as a different persona (e.g., “DAN” or “Developer Mode”) to bypass safety filters.
MULTISTEP_HIDINGComplex, multi-turn strategies where the malicious payload is hidden within seemingly innocent steps.
TASK_MISUSECoercing the model into performing tasks it wasn’t designed for, such as generating code for exploits.

Using the Injection detector

You configure the detector via guard policies. First create a policy, then evaluate with that policy by overriding enabled policies.
// 1) Create or update a Prompt Defense policy
await fetch("https://api.guardion.ai/v1/policies", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_API_KEY"
  },
  body: JSON.stringify({
    id: "prompt-defense",
    definition: "Prevent prompt injections and jailbreaks",
    threshold: 0.9
    detector: {
      model: "modern-guard",
      target: "user",
    }
  })
});

// 2) Evaluate using that policy
const response = await fetch("https://api.guardion.ai/v1/guard", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_API_KEY"
  },
  body: JSON.stringify({
    messages: [{ role: "user", content: "..." }],
    policy: "prompt-defense"
  })
});

Threshold levels

  • L1 (0.9): Confident
  • L2 (0.8): Very Likely
  • L3 (0.7): Likely
  • L4 (0.6): Less Likely
Adjust thresholds per use case to balance false positives and coverage.