Guardion-1-8B Model Card

Guardion-1-8B is a pruned and quantized version of OpenAI gpt-oss-safeguard-20b, optimized for lower latency and local or specialized deployment while retaining 96% of the original model’s quality. The base model gpt-oss-safeguard-20b is a 21B-parameter safety judge with 3.6B active parameters, designed for evaluating LLM outputs across safety, grounding, and policy compliance tasks. Guardion-1-8B distills this into a compact 8B-parameter model with 2.7B active parameters — making it suitable for on-premise, edge, and latency-sensitive use cases.

This is the model page for the Grounding and Custom guardrails. See the guardrail overviews in Grounding and Custom.

Overview

Base model: OpenAI gpt-oss-safeguard-20b (21B params, 3.6B active)
Parameters: 8 billion (2.7B active)
Optimization: Pruned and quantized, retaining 96% quality
Architecture: Decoder-only transformer
License: Apache 2.0
Developed by: Guardion AI

Key Features

Multilingual — cross-lingual safety evaluation across diverse languages
Custom policies — bring your own judging policies without additional training
LoRA-compatible — suitable for LoRA adapters for task-specific fine-tuning
Built-in safety policies — Aegis 2.0 taxonomy, RAG hallucination, function calling validation

Built-in Safety Policies

The model ships with built-in support for the Aegis 2.0 safety taxonomy: Core safety policies:

Policy	Description
Hate / Identity Hate	Content targeting individuals or groups based on protected attributes
Sexual Content	Explicit or suggestive material of a sexual nature
Sexual Minor	Any sexual content involving minors
Suicide & Self-Harm	Content encouraging or providing instructions for self-injury
Violence	Content promoting physical, mental, or sexual harm
Guns / Illegal Weapons	Content related to illegal weapon creation or use
Threat	Direct or implied threats against individuals or groups
Harassment	Targeted harassment, bullying, or intimidation
Criminal Planning	Instructions or confessions related to criminal activities
Controlled Substances	Content promoting illegal drug use or regulated substances
PII / Privacy	Attempts to solicit or expose private personal information
Profanity	Use of offensive language or slurs

Fine-grained policies:

Policy	Description
Illegal Activity	General illegal activities not covered by other categories
Immoral / Unethical	Actions violating moral or ethical standards
Unauthorized Advice	Unlicensed advice in medical, legal, or financial domains
Political / Misinformation	Political misinformation, conspiracy theories
Fraud / Deception	Scams, phishing, social engineering
Copyright / Trademark	IP violations, plagiarism
High Risk Gov. Decision Making	Sensitive government or institutional decisions
Malware	Code or instructions for malicious software
Manipulation	Psychological manipulation or coercion

RAG policies:

Policy	Description
Context Relevance	Retrieved context is not pertinent to answering the user’s question
Groundedness	Response includes claims not supported by or contradicted by the provided context
Answer Relevance	Response fails to address or properly respond to the user’s input

Agentic workflow policies:

Policy	Description
Function Calling Hallucination	Function calls with syntax or semantic errors based on the user query and available tools
Action Chain Safety	Detects malicious, dangerous, or harmful combinations of actions within agent traces and tool-call sequences

Available Versions

guardion-1-8b — current stable version

Benchmark Results

RAG Hallucination — TRUE Benchmark (Balanced Accuracy)

Measures faithfulness of LLM responses to provided context.

Model	AVG	frank	paws	qags_cnndm
Guardion-1-8B	0.777	0.886	0.825	0.814

Function Calling Hallucination (Balanced Accuracy)

Evaluated on the FC Reward Bench dataset for detecting hallucinations in agentic tool-calling workflows.

Model	AVG
Guardion-1-8B	0.73

How to Use Guardion-1-8B

Combine Guardion-1-8B with a guard policy, then evaluate with that policy.

Example: Grounding check

// 1) Create a Grounding policy powered by Guardion-1-8B
await fetch("https://api.guardion.ai/v1/policies", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_API_KEY"
  },
  body: JSON.stringify({
    id: "grounding-check",
    definition: "Detect hallucinations and ungrounded claims",
    target: "assistant",
    detector: {
      model: "guardion-1-8b",
      expected: "block",
      threshold: 0.9 // L1 (Confident). Use 0.8 for L2, 0.7 for L3, 0.6 for L4
    }
  })
});

// 2) Evaluate using that policy
const response = await fetch("https://api.guardion.ai/v1/guard", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_API_KEY"
  },
  body: JSON.stringify({
    messages: [
      { role: "context", content: "Our return policy allows returns within 14 days of purchase." }, // or "system" role
      { role: "user", content: "What is the return policy?" },
      { role: "assistant", content: "Returns are accepted within 30 days." }
    ],
    override_enabled_policies: ["grounding-check"]
  })
});

const result = await response.json();

if (result.flagged) {
  console.log("Hallucination detected:", result.reason);
} else {
  console.log("Response is grounded");
}

Grounding guardrail — configure policies for hallucination detection: /grounding
Custom guardrail — define your own policies: /custom

​Guardion-1-8B Model Card

​Overview

​Key Features

​Built-in Safety Policies

​Available Versions

​Benchmark Results

​RAG Hallucination — TRUE Benchmark (Balanced Accuracy)

​Function Calling Hallucination (Balanced Accuracy)

​How to Use Guardion-1-8B

​Example: Grounding check

​Related