The Injection detector protects your agents and applications from prompt injections, jailbreaks, context hijacking, and data exfiltration attempts. It is powered by the ModernGuard model family and is designed for low-latency, multilingual runtime use.Documentation Index
Fetch the complete documentation index at: https://docs.guardion.ai/llms.txt
Use this file to discover all available pages before exploring further.
What it detects
- Prompt injections and jailbreaks (e.g., DAN, Goodside)
- Context hijacking and instruction overrides
- Evasion and obfuscation (e.g., Unicode/ANSI/ASCII tricks)
- Data exfiltration and leakage attempts
- Code/command injection patterns (shell, SQL, tool abuse)
Available models (versions)
- ModernGuard v1.5 — latest, recommended for production
- ModernGuard v1 — stable, production-proven
- ModernGuard v0 — initial release
Categories
The current modelmodern-guard-v1.5 is trained to identify complex adversarial maneuvers. The detector classifies threats across several specialized categories:
| Category | Description |
|---|---|
| DIRECT_OVERRIDE | Explicit attempts to ignore previous system instructions (e.g., “Ignore all previous directions”). |
| OBFUSCATION | Using base64 encoding, leetspeak, or translated text to hide malicious intent from standard filters. |
| CONTEXT_HIJACK | Attempts to redirect the model’s focus away from its intended task toward a new, unauthorized context. |
| DATA_EXFILTRATION | Instructions designed to make the model reveal its system prompt, training data, or session secrets. |
| ROLE_IMPERSONATION | Forcing the model to act as a different persona (e.g., “DAN” or “Developer Mode”) to bypass safety filters. |
| MULTISTEP_HIDING | Complex, multi-turn strategies where the malicious payload is hidden within seemingly innocent steps. |
| TASK_MISUSE | Coercing the model into performing tasks it wasn’t designed for, such as generating code for exploits. |
Using the Injection detector
You configure the detector via guard policies. First create a policy, then evaluate with that policy by overriding enabled policies.Threshold levels
- L1 (0.9): Confident
- L2 (0.8): Very Likely
- L3 (0.7): Likely
- L4 (0.6): Less Likely