Prompt Defense

The Injection detector protects your agents and applications from prompt injections, jailbreaks, context hijacking, and data exfiltration attempts. It is powered by the ModernGuard model family and is designed for low-latency, multilingual runtime use.

What it detects

Prompt injections and jailbreaks (e.g., DAN, Goodside)
Context hijacking and instruction overrides
Evasion and obfuscation (e.g., Unicode/ANSI/ASCII tricks)
Data exfiltration and leakage attempts
Code/command injection patterns (shell, SQL, tool abuse)

Available models (versions)

ModernGuard v1.5 — latest, recommended for production
ModernGuard v1 — stable, production-proven
ModernGuard v0 — initial release

See the detailed model card in ModernGuard for architecture and benchmarks.

Categories

The current model modern-guard-v1.5 is trained to identify complex adversarial maneuvers. The detector classifies threats across several specialized categories:

Category	Description
DIRECT_OVERRIDE	Explicit attempts to ignore previous system instructions (e.g., “Ignore all previous directions”).
OBFUSCATION	Using base64 encoding, leetspeak, or translated text to hide malicious intent from standard filters.
CONTEXT_HIJACK	Attempts to redirect the model’s focus away from its intended task toward a new, unauthorized context.
DATA_EXFILTRATION	Instructions designed to make the model reveal its system prompt, training data, or session secrets.
ROLE_IMPERSONATION	Forcing the model to act as a different persona (e.g., “DAN” or “Developer Mode”) to bypass safety filters.
MULTISTEP_HIDING	Complex, multi-turn strategies where the malicious payload is hidden within seemingly innocent steps.
TASK_MISUSE	Coercing the model into performing tasks it wasn’t designed for, such as generating code for exploits.

Using the Injection detector

You configure the detector via guard policies. First create a policy, then evaluate with that policy by overriding enabled policies.

// 1) Create or update a Prompt Defense policy
await fetch("https://api.guardion.ai/v1/policies", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_API_KEY"
  },
  body: JSON.stringify({
    id: "prompt-defense",
    definition: "Prevent prompt injections and jailbreaks",
    threshold: 0.9
    detector: {
      model: "modern-guard",
      target: "user",
    }
  })
});

// 2) Evaluate using that policy
const response = await fetch("https://api.guardion.ai/v1/guard", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_API_KEY"
  },
  body: JSON.stringify({
    messages: [{ role: "user", content: "..." }],
    policy: "prompt-defense"
  })
});

Threshold levels

L1 (0.9): Confident
L2 (0.8): Very Likely
L3 (0.7): Likely
L4 (0.6): Less Likely

Adjust thresholds per use case to balance false positives and coverage.

Getting Started

Platform

Detectors

Guardion Models

Integrations

Guardion API endpoints

What it detects

Available models (versions)

Categories

Using the Injection detector

Threshold levels

Getting Started

Platform

Detectors

Guardion Models

Integrations

Guardion API endpoints

​What it detects

​Available models (versions)

​Categories

​Using the Injection detector

​Threshold levels

What it detects

Available models (versions)

Categories

Using the Injection detector

Threshold levels