Guardion-1-8B Model Card
Guardion-1-8B is a pruned and quantized version of OpenAI gpt-oss-safeguard-20b, optimized for lower latency and local or specialized deployment while retaining 96% of the original model’s quality. The base modelgpt-oss-safeguard-20b is a 21B-parameter safety judge with 3.6B active parameters, designed for evaluating LLM outputs across safety, grounding, and policy compliance tasks. Guardion-1-8B distills this into a compact 8B-parameter model with 2.7B active parameters — making it suitable for on-premise, edge, and latency-sensitive use cases.
This is the model page for the Grounding and Custom guardrails. See the guardrail overviews in Grounding and Custom.
Overview
- Base model: OpenAI gpt-oss-safeguard-20b (21B params, 3.6B active)
- Parameters: 8 billion (2.7B active)
- Optimization: Pruned and quantized, retaining 96% quality
- Architecture: Decoder-only transformer
- License: Apache 2.0
- Developed by: Guardion AI
Key Features
- Multilingual — cross-lingual safety evaluation across diverse languages
- Custom policies — bring your own judging policies without additional training
- LoRA-compatible — suitable for LoRA adapters for task-specific fine-tuning
- Built-in safety policies — Aegis 2.0 taxonomy, RAG hallucination, function calling validation
Built-in Safety Policies
The model ships with built-in support for the Aegis 2.0 safety taxonomy: Core safety policies:| Policy | Description |
|---|---|
| Hate / Identity Hate | Content targeting individuals or groups based on protected attributes |
| Sexual Content | Explicit or suggestive material of a sexual nature |
| Sexual Minor | Any sexual content involving minors |
| Suicide & Self-Harm | Content encouraging or providing instructions for self-injury |
| Violence | Content promoting physical, mental, or sexual harm |
| Guns / Illegal Weapons | Content related to illegal weapon creation or use |
| Threat | Direct or implied threats against individuals or groups |
| Harassment | Targeted harassment, bullying, or intimidation |
| Criminal Planning | Instructions or confessions related to criminal activities |
| Controlled Substances | Content promoting illegal drug use or regulated substances |
| PII / Privacy | Attempts to solicit or expose private personal information |
| Profanity | Use of offensive language or slurs |
| Policy | Description |
|---|---|
| Illegal Activity | General illegal activities not covered by other categories |
| Immoral / Unethical | Actions violating moral or ethical standards |
| Unauthorized Advice | Unlicensed advice in medical, legal, or financial domains |
| Political / Misinformation | Political misinformation, conspiracy theories |
| Fraud / Deception | Scams, phishing, social engineering |
| Copyright / Trademark | IP violations, plagiarism |
| High Risk Gov. Decision Making | Sensitive government or institutional decisions |
| Malware | Code or instructions for malicious software |
| Manipulation | Psychological manipulation or coercion |
| Policy | Description |
|---|---|
| Context Relevance | Retrieved context is not pertinent to answering the user’s question |
| Groundedness | Response includes claims not supported by or contradicted by the provided context |
| Answer Relevance | Response fails to address or properly respond to the user’s input |
| Policy | Description |
|---|---|
| Function Calling Hallucination | Function calls with syntax or semantic errors based on the user query and available tools |
| Action Chain Safety | Detects malicious, dangerous, or harmful combinations of actions within agent traces and tool-call sequences |
Available Versions
- guardion-1-8b — current stable version
Benchmark Results
RAG Hallucination — TRUE Benchmark (Balanced Accuracy)
Measures faithfulness of LLM responses to provided context.| Model | AVG | frank | paws | qags_cnndm |
|---|---|---|---|---|
| Guardion-1-8B | 0.777 | 0.886 | 0.825 | 0.814 |
Function Calling Hallucination (Balanced Accuracy)
Evaluated on the FC Reward Bench dataset for detecting hallucinations in agentic tool-calling workflows.| Model | AVG |
|---|---|
| Guardion-1-8B | 0.73 |
How to Use Guardion-1-8B
Combine Guardion-1-8B with a guard policy, then evaluate with that policy.Example: Grounding check
Related
- Grounding guardrail — configure policies for hallucination detection: /grounding
- Custom guardrail — define your own policies: /custom