Multilingual and Ultra-Fast Prompt Attack Detector for AI Agent Security
Developed by industry experts with experience building enterprise-grade AI guardrails at Siri Apple, Nubank and other leading companies,
ModernGuard is a specialized and modern transformer-encoder model designed to detect and prevent prompt attacks in real-time. This enterprise-grade solution offers multilingual support and ultra-fast inference capabilities to protect GenAI systems across various domains.
This is the result for the benchmark, collecting public and private threats from red teaming partners and with set of updated threats database used from NVIDIA Garak and PromptFoo libraries. Our comprehensive evaluation demonstrates ModernGuard’s superior performance across diverse attack vectors.
The benchmark methodology includes:
Model | Overall F1-Score |
---|---|
modern-guard-500M-modernBERT-v1 | 0.9718 |
modern-guard-120M-modernBERT-v1 | 0.9301 |
Lakera Guard | 0.8600 |
protectai/deberta-v3-base-prompt-injection-v2 | 0.6008 |
deepset/deberta-v3-base-injection | 0.5725 |
meta-llama/Prompt-Guard-86M | 0.4555 |
jackhhao/jailbreak-classifier | 0.5000 |
We missed any other prompt injection detector model or solution? Please, let us know, and we can add the evaluation as well.
Threat Category | guardion/Modern-Guard-1 | meta-llama/Prompt-Guard-86M | protectai/deberta-v3-base-prompt-injection-v2 | deepset/deberta-v3-base-injection | jackhhao/jailbreak-classifier | lakera-guard |
---|---|---|---|---|---|---|
Encoding | 0.972667 | 0.567333 | 0.530222 | 0.889556 | 0.000000 | 0.677778 |
Prompt Injection | 0.968602 | 0.308043 | 0.755299 | 0.899980 | 0.142857 | 0.878889 |
Jailbreaking | 0.981274 | 0.621297 | 0.360996 | 0.764824 | 0.000000 | 0.738333 |
Exfiltration & Leakage | 0.999667 | 0.284000 | 0.587730 | 0.981667 | 0.000000 | 0.850000 |
Evasion & Obfuscation | 0.994659 | 0.583764 | 0.453216 | 0.794332 | 0.000000 | 0.728889 |
Code and Command Injection | 0.990200 | 0.474000 | 0.455200 | 0.796400 | 0.000000 | 0.808000 |
Hard Negatives | 0.958000 | 0.754000 | 0.756000 | 0.014000 | 1.000000 | 0.840000 |
Regular Content | 0.968000 | 0.379000 | 0.786000 | 0.222000 | 1.000000 | 0.940000 |
Benchmarks span 40+ attack classes including obfuscation (e.g. ANSI, ASCII), jailbreaks (e.g. DAN, Goodside), injections (e.g. SQL, shell), and real-world attacks observed in LLM deployments.
A comprehensive research paper detailing ModernGuard’s architecture, training methodology, and benchmark results will be published soon.
You need to combine the ModernGuard detector with a guardrail policy, so you can have control and fine-tune it for the specific risk level you want to manage (threshold levels).
Multilingual and Ultra-Fast Prompt Attack Detector for AI Agent Security
Developed by industry experts with experience building enterprise-grade AI guardrails at Siri Apple, Nubank and other leading companies,
ModernGuard is a specialized and modern transformer-encoder model designed to detect and prevent prompt attacks in real-time. This enterprise-grade solution offers multilingual support and ultra-fast inference capabilities to protect GenAI systems across various domains.
This is the result for the benchmark, collecting public and private threats from red teaming partners and with set of updated threats database used from NVIDIA Garak and PromptFoo libraries. Our comprehensive evaluation demonstrates ModernGuard’s superior performance across diverse attack vectors.
The benchmark methodology includes:
Model | Overall F1-Score |
---|---|
modern-guard-500M-modernBERT-v1 | 0.9718 |
modern-guard-120M-modernBERT-v1 | 0.9301 |
Lakera Guard | 0.8600 |
protectai/deberta-v3-base-prompt-injection-v2 | 0.6008 |
deepset/deberta-v3-base-injection | 0.5725 |
meta-llama/Prompt-Guard-86M | 0.4555 |
jackhhao/jailbreak-classifier | 0.5000 |
We missed any other prompt injection detector model or solution? Please, let us know, and we can add the evaluation as well.
Threat Category | guardion/Modern-Guard-1 | meta-llama/Prompt-Guard-86M | protectai/deberta-v3-base-prompt-injection-v2 | deepset/deberta-v3-base-injection | jackhhao/jailbreak-classifier | lakera-guard |
---|---|---|---|---|---|---|
Encoding | 0.972667 | 0.567333 | 0.530222 | 0.889556 | 0.000000 | 0.677778 |
Prompt Injection | 0.968602 | 0.308043 | 0.755299 | 0.899980 | 0.142857 | 0.878889 |
Jailbreaking | 0.981274 | 0.621297 | 0.360996 | 0.764824 | 0.000000 | 0.738333 |
Exfiltration & Leakage | 0.999667 | 0.284000 | 0.587730 | 0.981667 | 0.000000 | 0.850000 |
Evasion & Obfuscation | 0.994659 | 0.583764 | 0.453216 | 0.794332 | 0.000000 | 0.728889 |
Code and Command Injection | 0.990200 | 0.474000 | 0.455200 | 0.796400 | 0.000000 | 0.808000 |
Hard Negatives | 0.958000 | 0.754000 | 0.756000 | 0.014000 | 1.000000 | 0.840000 |
Regular Content | 0.968000 | 0.379000 | 0.786000 | 0.222000 | 1.000000 | 0.940000 |
Benchmarks span 40+ attack classes including obfuscation (e.g. ANSI, ASCII), jailbreaks (e.g. DAN, Goodside), injections (e.g. SQL, shell), and real-world attacks observed in LLM deployments.
A comprehensive research paper detailing ModernGuard’s architecture, training methodology, and benchmark results will be published soon.
You need to combine the ModernGuard detector with a guardrail policy, so you can have control and fine-tune it for the specific risk level you want to manage (threshold levels).