Skip to main content
The Data Protection detector identifies, classifies, and manages the exposure of Personally Identifiable Information (PII) in LLM inputs and outputs. It is designed to prevent data leakage and ensure compliance with privacy regulations (GDPR, LGPD, CCPA) by detecting sensitive entities like names, documents, and contact details.

Capabilities

The current model pii-v0 is optimized for multilingual inputs and supports the following entity categories:
CategoryLabelCoverage Examples
ContactCONTACTEmail addresses, phone numbers (mobile/landline), social media handles.
DocumentDOCUMENTNational IDs (CPF, CNPJ, SSN), Passports, Driver’s Licenses (CNH/RG), Tax IDs.
LocationLOCATIONStreet addresses, cities, states, zip/postal codes.
PersonalNAMEFull names, first names, and family names.

Threshold Configuration

You can adjust the sensitivity of the detector using threshold levels. A lower threshold increases recall (catches more, potential false positives), while a higher threshold increases precision.
LevelThresholdConfidence
L10.9Confident (Recommended for automation)
L20.8Very Likely
L30.7Likely
L40.6Less Likely

Usage

1. Define a Policy

First, configure a guard policy to specific the behavior. You can choose to simply monitor or block/redact.
// POST /v1/policies
{
  "id": "pii-policy",
  "definition": "Detect and mask PII exposure",
  "threshold": 0.9 // Flag only confident
  "detector": {
    "model": "pii",
    "target": "assistant", // Monitors LLM output
  }
}

2. Evaluate Content

Send the content to the Guard API. If PII is detected, the response will contain both a diagnostic breakdown and a correction object containing the redacted text. Redaction Format: Detected entities are replaced with a vaulted token format: [CATEGORY_HASH].

API Example

In this scenario, a user submits a prompt containing a mix of contact information and government documents. The API detects these entities and returns a safe, redacted version.

Request Payload

{
  "session": "sess_001",
  "messages": [
    {
      "role": "user",
      "content": "Please update my billing record. My new email is [email protected] and my CPF is 123.456.789-00. Also change my address to Av. Paulista, 1000."
    }
  ],
  "override_enabled_policies": ["pii-policy"],
  "fail_fast": false,
  "breakdown_all": true
}

Response

The response flags the content as unsafe (flagged: true) and provides the sanitized text in the correction object.
{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "object": "eval",
  "time": 145,
  "created": 1768200100,
  "flagged": true,
  "breakdown": [
    {
      "policy_id": "pii-policy",
      "detector": "pii-v0",
      "detected": true,
      "threshold": 0.9,
      "score": 0.998,
      "result": [
        {
          "label": "CONTACT",
          "score": 0.999,
          "text": "[email protected]",
          "spans": { "start": 44, "end": 69 }
        },
        {
          "label": "DOCUMENT",
          "score": 0.998,
          "text": "123.456.789-00",
          "spans": { "start": 84, "end": 98 }
        },
        {
          "label": "LOCATION",
          "score": 0.950,
          "text": "Av. Paulista, 1000",
          "spans": { "start": 125, "end": 143 }
        }
      ]
    }
  ],
  "correction": {
    "choices": [
      {
        "role": "user",
        "index": 0,
        "flagged": true,
        "redacted": true,
        "content": "Please update my billing record. My new email is [CONTACT_8F3A21] and my CPF is [DOCUMENT_9C4B52]. Also change my address to [LOCATION_7D1E99]."
      }
    ]
  }
}

Reveal (Detokenize) API — Learn how to restore original data from tokens. Reveal (Detokenize) API