> ## Documentation Index
> Fetch the complete documentation index at: https://docs.guardion.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Guardion-1-8B

> Multilingual AI safety judge for grounding, hallucination detection, and custom policy evaluation

# Guardion-1-8B Model Card

Guardion-1-8B is a pruned and quantized version of [OpenAI gpt-oss-safeguard-20b](https://huggingface.co/openai/gpt-oss-safeguard-20b), optimized for lower latency and local or specialized deployment while retaining **96% of the original model's quality**.

The base model `gpt-oss-safeguard-20b` is a 21B-parameter safety judge with 3.6B active parameters, designed for evaluating LLM outputs across safety, grounding, and policy compliance tasks. Guardion-1-8B distills this into a compact 8B-parameter model with 2.7B active parameters — making it suitable for on-premise, edge, and latency-sensitive use cases.

> This is the model page for the **Grounding** and **Custom** guardrails. See the guardrail overviews in [Grounding](/grounding) and [Custom](/custom).

***

## Overview

* **Base model:** OpenAI gpt-oss-safeguard-20b (21B params, 3.6B active)
* **Parameters:** 8 billion (2.7B active)
* **Optimization:** Pruned and quantized, retaining 96% quality
* **Architecture:** Decoder-only transformer
* **License:** Apache 2.0
* **Developed by:** Guardion AI

## Key Features

* **Multilingual** — cross-lingual safety evaluation across diverse languages
* **Custom policies** — bring your own judging policies without additional training
* **LoRA-compatible** — suitable for LoRA adapters for task-specific fine-tuning
* **Built-in safety policies** — Aegis 2.0 taxonomy, RAG hallucination, function calling validation

## Built-in Safety Policies

The model ships with built-in support for the [Aegis 2.0 safety taxonomy](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0):

**Core safety policies:**

| Policy                     | Description                                                           |
| :------------------------- | :-------------------------------------------------------------------- |
| **Hate / Identity Hate**   | Content targeting individuals or groups based on protected attributes |
| **Sexual Content**         | Explicit or suggestive material of a sexual nature                    |
| **Sexual Minor**           | Any sexual content involving minors                                   |
| **Suicide & Self-Harm**    | Content encouraging or providing instructions for self-injury         |
| **Violence**               | Content promoting physical, mental, or sexual harm                    |
| **Guns / Illegal Weapons** | Content related to illegal weapon creation or use                     |
| **Threat**                 | Direct or implied threats against individuals or groups               |
| **Harassment**             | Targeted harassment, bullying, or intimidation                        |
| **Criminal Planning**      | Instructions or confessions related to criminal activities            |
| **Controlled Substances**  | Content promoting illegal drug use or regulated substances            |
| **PII / Privacy**          | Attempts to solicit or expose private personal information            |
| **Profanity**              | Use of offensive language or slurs                                    |

**Fine-grained policies:**

| Policy                             | Description                                                |
| :--------------------------------- | :--------------------------------------------------------- |
| **Illegal Activity**               | General illegal activities not covered by other categories |
| **Immoral / Unethical**            | Actions violating moral or ethical standards               |
| **Unauthorized Advice**            | Unlicensed advice in medical, legal, or financial domains  |
| **Political / Misinformation**     | Political misinformation, conspiracy theories              |
| **Fraud / Deception**              | Scams, phishing, social engineering                        |
| **Copyright / Trademark**          | IP violations, plagiarism                                  |
| **High Risk Gov. Decision Making** | Sensitive government or institutional decisions            |
| **Malware**                        | Code or instructions for malicious software                |
| **Manipulation**                   | Psychological manipulation or coercion                     |

**RAG policies:**

| Policy                | Description                                                                       |
| :-------------------- | :-------------------------------------------------------------------------------- |
| **Context Relevance** | Retrieved context is not pertinent to answering the user's question               |
| **Groundedness**      | Response includes claims not supported by or contradicted by the provided context |
| **Answer Relevance**  | Response fails to address or properly respond to the user's input                 |

**Agentic workflow policies:**

| Policy                             | Description                                                                                                  |
| :--------------------------------- | :----------------------------------------------------------------------------------------------------------- |
| **Function Calling Hallucination** | Function calls with syntax or semantic errors based on the user query and available tools                    |
| **Action Chain Safety**            | Detects malicious, dangerous, or harmful combinations of actions within agent traces and tool-call sequences |

## Available Versions

* guardion-1-8b — current stable version

***

## Benchmark Results

### RAG Hallucination — TRUE Benchmark (Balanced Accuracy)

Measures faithfulness of LLM responses to provided context.

| Model             |    AVG    | frank |  paws | qags\_cnndm |
| :---------------- | :-------: | :---: | :---: | :---------: |
| **Guardion-1-8B** | **0.777** | 0.886 | 0.825 |    0.814    |

### Function Calling Hallucination (Balanced Accuracy)

Evaluated on the FC Reward Bench dataset for detecting hallucinations in agentic tool-calling workflows.

| Model             |    AVG   |
| :---------------- | :------: |
| **Guardion-1-8B** | **0.73** |

***

## How to Use Guardion-1-8B

Combine Guardion-1-8B with a guard policy, then evaluate with that policy.

### Example: Grounding check

```js theme={null}
// 1) Create a Grounding policy powered by Guardion-1-8B
await fetch("https://api.guardion.ai/v1/policies", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_API_KEY"
  },
  body: JSON.stringify({
    id: "grounding-check",
    definition: "Detect hallucinations and ungrounded claims",
    target: "assistant",
    detector: {
      model: "guardion-1-8b",
      expected: "block",
      threshold: 0.9 // L1 (Confident). Use 0.8 for L2, 0.7 for L3, 0.6 for L4
    }
  })
});

// 2) Evaluate using that policy
const response = await fetch("https://api.guardion.ai/v1/guard", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_API_KEY"
  },
  body: JSON.stringify({
    messages: [
      { role: "context", content: "Our return policy allows returns within 14 days of purchase." }, // or "system" role
      { role: "user", content: "What is the return policy?" },
      { role: "assistant", content: "Returns are accepted within 30 days." }
    ],
    override_enabled_policies: ["grounding-check"]
  })
});

const result = await response.json();

if (result.flagged) {
  console.log("Hallucination detected:", result.reason);
} else {
  console.log("Response is grounded");
}
```

## Related

* Grounding guardrail — configure policies for hallucination detection: [/grounding](/grounding)
* Custom guardrail — define your own policies: [/custom](/custom)
