Multilingual and Ultra-Fast Prompt Attack Detector for AI Agent Security
Model | Overall F1-Score |
---|---|
modern-guard-500M-modernBERT-v1 | 0.9718 |
modern-guard-120M-modernBERT-v1 | 0.9301 |
Lakera Guard | 0.8600 |
protectai/deberta-v3-base-prompt-injection-v2 | 0.6008 |
deepset/deberta-v3-base-injection | 0.5725 |
meta-llama/Prompt-Guard-86M | 0.4555 |
jackhhao/jailbreak-classifier | 0.5000 |
Threat Category | guardion/Modern-Guard-1 | meta-llama/Prompt-Guard-86M | protectai/deberta-v3-base-prompt-injection-v2 | deepset/deberta-v3-base-injection | jackhhao/jailbreak-classifier | lakera-guard |
---|---|---|---|---|---|---|
Encoding | 0.972667 | 0.567333 | 0.530222 | 0.889556 | 0.000000 | 0.677778 |
Prompt Injection | 0.968602 | 0.308043 | 0.755299 | 0.899980 | 0.142857 | 0.878889 |
Jailbreaking | 0.981274 | 0.621297 | 0.360996 | 0.764824 | 0.000000 | 0.738333 |
Exfiltration & Leakage | 0.999667 | 0.284000 | 0.587730 | 0.981667 | 0.000000 | 0.850000 |
Evasion & Obfuscation | 0.994659 | 0.583764 | 0.453216 | 0.794332 | 0.000000 | 0.728889 |
Code and Command Injection | 0.990200 | 0.474000 | 0.455200 | 0.796400 | 0.000000 | 0.808000 |
Hard Negatives | 0.958000 | 0.754000 | 0.756000 | 0.014000 | 1.000000 | 0.840000 |
Regular Content | 0.968000 | 0.379000 | 0.786000 | 0.222000 | 1.000000 | 0.940000 |