The Advanced Prompt Attack Detection
Developed by industry experts with experience building enterprise-grade AI guardrails at Siri Apple, Nubank and other leading companies, ModernGuard is a specialized and modern transformer-encoder model designed to detect and prevent prompt attacks in real-time. This enterprise-grade solution offers multilingual support and ultra-fast inference capabilities to protect GenAI systems across various domains.
This is the model page for the Prompt Defense detector. See the detector overview in Prompt Defense.
Model Card
Modern Transformer-Encoder Architecture
- Built on ModernBERT, a high-efficiency encoder
- Features Rotary Positional Embeddings, Flash Attention, and memory optimizations
- Supports 8K token context with low latency
⚡ Ultra-Fast Inference
- Optimized for real-time streaming and in-line LLM applications
- Achieves sub ~50ms latency in production environments
Multilingual and Domain-Aware
- Trained on data in 8+ languages
- Covers banking, fintech, ecommerce, healthcare, and other verticals
🔐 Threat Intelligence Training + Continuous Updates
- Pretrained on 1 trillion tokens
- Fine-tuned on millions of simulated and real-world prompt attacks
- Proprietary red teaming data generated by AI attackers + red team partners
- AI threat databases & state-of-the-art prompt attack vectors
- Diverse synthetic data generation for safe examples
- Continuous updates with emerging threat patterns
Available Versions
- modern-guard-v1.5 — latest, recommended for production
- modern-guard-v1 — stable, production-proven
- modern-guard-v0 — initial release
Benchmark Results
This is the result for the benchmark, collecting public and private threats from red teaming partners and with set of updated threats database used from NVIDIA Garak and PromptFoo libraries. Our comprehensive evaluation demonstrates ModernGuard’s superior performance across diverse attack vectors. The benchmark methodology includes:- Evaluation against 40+ attack classes
- Cross-validation across multiple domains and languages
Overall F1-Scores
| Model | Overall F1-Score |
|---|---|
| modern-guard-500M-modernBERT-v1 | 0.9718 |
| modern-guard-120M-modernBERT-v1 | 0.9301 |
| Lakera Guard | 0.8600 |
| protectai/deberta-v3-base-prompt-injection-v2 | 0.6008 |
| deepset/deberta-v3-base-injection | 0.5725 |
| meta-llama/Prompt-Guard-86M | 0.4555 |
| jackhhao/jailbreak-classifier | 0.5000 |
Threat Category Coverage
| Threat Category | guardion/Modern-Guard-1 | meta-llama/Prompt-Guard-86M | protectai/deberta-v3-base-prompt-injection-v2 | deepset/deberta-v3-base-injection | jackhhao/jailbreak-classifier | lakera-guard |
|---|---|---|---|---|---|---|
| Encoding | 0.972667 | 0.567333 | 0.530222 | 0.889556 | 0.000000 | 0.677778 |
| Prompt Injection | 0.968602 | 0.308043 | 0.755299 | 0.899980 | 0.142857 | 0.878889 |
| Jailbreaking | 0.981274 | 0.621297 | 0.360996 | 0.764824 | 0.000000 | 0.738333 |
| Exfiltration & Leakage | 0.999667 | 0.284000 | 0.587730 | 0.981667 | 0.000000 | 0.850000 |
| Evasion & Obfuscation | 0.994659 | 0.583764 | 0.453216 | 0.794332 | 0.000000 | 0.728889 |
| Code and Command Injection | 0.990200 | 0.474000 | 0.455200 | 0.796400 | 0.000000 | 0.808000 |
| Hard Negatives | 0.958000 | 0.754000 | 0.756000 | 0.014000 | 1.000000 | 0.840000 |
| Regular Content | 0.968000 | 0.379000 | 0.786000 | 0.222000 | 1.000000 | 0.940000 |

How to Use ModernGuard
Combine ModernGuard with a guardrail policy, then evaluate with that policy.💡 Example integration
Related
- Injection — how to use ModernGuard models as a detector in policies