Skip to main content

Guardion-1-8B Model Card

Guardion-1-8B is a pruned and quantized version of OpenAI gpt-oss-safeguard-20b, optimized for lower latency and local or specialized deployment while retaining 96% of the original model’s quality. The base model gpt-oss-safeguard-20b is a 21B-parameter safety judge with 3.6B active parameters, designed for evaluating LLM outputs across safety, grounding, and policy compliance tasks. Guardion-1-8B distills this into a compact 8B-parameter model with 2.7B active parameters — making it suitable for on-premise, edge, and latency-sensitive use cases.
This is the model page for the Grounding and Custom guardrails. See the guardrail overviews in Grounding and Custom.

Overview

  • Base model: OpenAI gpt-oss-safeguard-20b (21B params, 3.6B active)
  • Parameters: 8 billion (2.7B active)
  • Optimization: Pruned and quantized, retaining 96% quality
  • Architecture: Decoder-only transformer
  • License: Apache 2.0
  • Developed by: Guardion AI

Key Features

  • Multilingual — cross-lingual safety evaluation across diverse languages
  • Custom policies — bring your own judging policies without additional training
  • LoRA-compatible — suitable for LoRA adapters for task-specific fine-tuning
  • Built-in safety policies — Aegis 2.0 taxonomy, RAG hallucination, function calling validation

Built-in Safety Policies

The model ships with built-in support for the Aegis 2.0 safety taxonomy: Core safety policies:
PolicyDescription
Hate / Identity HateContent targeting individuals or groups based on protected attributes
Sexual ContentExplicit or suggestive material of a sexual nature
Sexual MinorAny sexual content involving minors
Suicide & Self-HarmContent encouraging or providing instructions for self-injury
ViolenceContent promoting physical, mental, or sexual harm
Guns / Illegal WeaponsContent related to illegal weapon creation or use
ThreatDirect or implied threats against individuals or groups
HarassmentTargeted harassment, bullying, or intimidation
Criminal PlanningInstructions or confessions related to criminal activities
Controlled SubstancesContent promoting illegal drug use or regulated substances
PII / PrivacyAttempts to solicit or expose private personal information
ProfanityUse of offensive language or slurs
Fine-grained policies:
PolicyDescription
Illegal ActivityGeneral illegal activities not covered by other categories
Immoral / UnethicalActions violating moral or ethical standards
Unauthorized AdviceUnlicensed advice in medical, legal, or financial domains
Political / MisinformationPolitical misinformation, conspiracy theories
Fraud / DeceptionScams, phishing, social engineering
Copyright / TrademarkIP violations, plagiarism
High Risk Gov. Decision MakingSensitive government or institutional decisions
MalwareCode or instructions for malicious software
ManipulationPsychological manipulation or coercion
RAG policies:
PolicyDescription
Context RelevanceRetrieved context is not pertinent to answering the user’s question
GroundednessResponse includes claims not supported by or contradicted by the provided context
Answer RelevanceResponse fails to address or properly respond to the user’s input
Agentic workflow policies:
PolicyDescription
Function Calling HallucinationFunction calls with syntax or semantic errors based on the user query and available tools
Action Chain SafetyDetects malicious, dangerous, or harmful combinations of actions within agent traces and tool-call sequences

Available Versions

  • guardion-1-8b — current stable version

Benchmark Results

RAG Hallucination — TRUE Benchmark (Balanced Accuracy)

Measures faithfulness of LLM responses to provided context.
ModelAVGfrankpawsqags_cnndm
Guardion-1-8B0.7770.8860.8250.814

Function Calling Hallucination (Balanced Accuracy)

Evaluated on the FC Reward Bench dataset for detecting hallucinations in agentic tool-calling workflows.
ModelAVG
Guardion-1-8B0.73

How to Use Guardion-1-8B

Combine Guardion-1-8B with a guard policy, then evaluate with that policy.

Example: Grounding check

// 1) Create a Grounding policy powered by Guardion-1-8B
await fetch("https://api.guardion.ai/v1/policies", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_API_KEY"
  },
  body: JSON.stringify({
    id: "grounding-check",
    definition: "Detect hallucinations and ungrounded claims",
    target: "assistant",
    detector: {
      model: "guardion-1-8b",
      expected: "block",
      threshold: 0.9 // L1 (Confident). Use 0.8 for L2, 0.7 for L3, 0.6 for L4
    }
  })
});

// 2) Evaluate using that policy
const response = await fetch("https://api.guardion.ai/v1/guard", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_API_KEY"
  },
  body: JSON.stringify({
    messages: [
      { role: "context", content: "Our return policy allows returns within 14 days of purchase." }, // or "system" role
      { role: "user", content: "What is the return policy?" },
      { role: "assistant", content: "Returns are accepted within 30 days." }
    ],
    override_enabled_policies: ["grounding-check"]
  })
});

const result = await response.json();

if (result.flagged) {
  console.log("Hallucination detected:", result.reason);
} else {
  console.log("Response is grounded");
}
  • Grounding guardrail — configure policies for hallucination detection: /grounding
  • Custom guardrail — define your own policies: /custom