What it detects
- Prompt injections and jailbreaks (e.g., DAN, Goodside)
- Context hijacking and instruction overrides
- Evasion and obfuscation (e.g., Unicode/ANSI/ASCII tricks)
- Data exfiltration and leakage attempts
- Code/command injection patterns (shell, SQL, tool abuse)
Available models (versions)
- ModernGuard v1.5 — latest, recommended for production
- ModernGuard v1 — stable, production-proven
- ModernGuard v0 — initial release
Categories
The current modelmodern-guard-v1.5 is trained to identify complex adversarial maneuvers. The detector classifies threats across several specialized categories:
| Category | Description |
|---|---|
| DIRECT_OVERRIDE | Explicit attempts to ignore previous system instructions (e.g., “Ignore all previous directions”). |
| OBFUSCATION | Using base64 encoding, leetspeak, or translated text to hide malicious intent from standard filters. |
| CONTEXT_HIJACK | Attempts to redirect the model’s focus away from its intended task toward a new, unauthorized context. |
| DATA_EXFILTRATION | Instructions designed to make the model reveal its system prompt, training data, or session secrets. |
| ROLE_IMPERSONATION | Forcing the model to act as a different persona (e.g., “DAN” or “Developer Mode”) to bypass safety filters. |
| MULTISTEP_HIDING | Complex, multi-turn strategies where the malicious payload is hidden within seemingly innocent steps. |
| TASK_MISUSE | Coercing the model into performing tasks it wasn’t designed for, such as generating code for exploits. |
Using the Injection detector
You configure the detector via guard policies. First create a policy, then evaluate with that policy by overriding enabled policies.Threshold levels
- L1 (0.9): Confident
- L2 (0.8): Very Likely
- L3 (0.7): Likely
- L4 (0.6): Less Likely