Logs & Investigation

The Logs interface is your central hub for monitoring and investigating AI interactions. Each log entry contains detailed information about the request section, time, messages and any flags triggered during evaluation.

Filtering and search

Quickly find relevant requests with these filtering tools:

Threat filters - Focus on specific threat categories or view all requests
Full-text search - Find specific content within prompts or responses
Time range - Narrow results to specific time periods (day, week, or custom)
Confidence threshold - Adjust minimum confidence score to focus on high-probability threats (coming soon)
Session ID - Track specific user sessions across multiple interactions

Use the search function to look for specific keywords that might indicate security concerns, such as “password,” “API key,” or “database access.”

Log details

Click on any log entry to view comprehensive information about the interaction:

Each log detail view includes:

Complete conversation history - See the full context of the interaction
Detection metadata - View confidence scores and specific policies triggered
Source information - Identify the origin with session ID and client metadata
Feedback controls - Provide input to improve detection accuracy

For interactions that passed all security checks, you’ll see a simplified view:

Providing feedback

When reviewing logs, you can help improve Guardion’s detection accuracy by providing feedback on any misclassifications you find:

Open the log detail view for the interaction
Review the detection results and conversation context
If you identify a false positive or false negative, click Mark as Misclassification
Your feedback is immediately incorporated into the detection system

Your feedback helps build a dataset specific to your policy, making Guardion’s runtime control more accurate over time. Learn more in our Feedbacks documentation.

Video walkthrough

Watch our detailed walkthrough to see the investigation tools in action:

Key terms

Understanding the following terms will help you effectively our platform:

Threats

Threats are specific types of risks that Guardion detects, such as prompt injections, jailbreaks, or harmful content. Each threat type has its own detector and can be configured as part of your policies. The logs interface shows which threats were detected in each interaction.

Flagged

When content triggers one of your policies, it gets “flagged” in the system, meaning a risk has been identified. Flagged content appears in your logs with detailed information about which policies were triggered and why. This visibility gives you a clear audit trail to quickly identify, investigate, and remediate potential security and compliance issues across your AI interactions.

Confidence score

For each detected threat, Guardion provides a confidence score (0 to 1) indicating how certain the system is about the classification. Higher scores represent greater certainty that a real threat exists.

Threshold

Thresholds are configurable values that determine when a detection triggers a flag. You can adjust thresholds for each policy to balance control and usability.

L1 (Lenient): Provides basic protection with minimal false positives, offering a balance that favors user experience over strict control
L2 (Moderate): Balanced approach with moderate protection and acceptable false positive rates
L3 (Enhanced): Stronger protection with potentially more false positives, prioritizing security over perfect accuracy
L4 (Strict): Maximum protection level with potentially higher false positive rates but ensures comprehensive coverage against potential threats

Guardrails vs. Policies vs. Detectors

Guardrails are the protective boundaries you establish around your AI systems. They’re implemented through policies and help ensure your AI behaves according to your requirements and security standards.
Policies are the rules you configure that determine how Guardion should handle different types of content. A policy defines which detectors to use, what thresholds and where target to apply.
Detectors are the specific mechanisms that identify particular types of threats. Each detector is specialized for a certain category of risk (e.g., prompt injection detector, harmful content detector, PII detector, code generation detector, etc). Policies use one or more detectors with configured thresholds.

Best practices

Regular monitoring: Schedule time to review logs daily or weekly depending on your traffic volume
Investigate patterns: Look for repeated attempts that might indicate targeted attacks
Provide feedback: Mark misclassifications to continuously improve detection accuracy
Set up alerts: Configure notifications for high-confidence threats (coming soon)
Export data: Use the API to integrate log data with your existing security tools (coming soon)

Feedbacks

Learn how to improve detection accuracy with feedback

Account Security Features

Explore Guardion’s comprehensive security capabilities

API Reference

Integrate Guardion’s protection into your applications

Getting Started

Platform

Detectors

Integrations

Guardion API endpoints

Logs & Investigation

Filtering and search

Log details

Providing feedback

Video walkthrough

Key terms

Threats

Flagged

Confidence score

Threshold

Guardrails vs. Policies vs. Detectors

Best practices

Feedbacks

Account Security Features

API Reference

Getting Started

Platform

Detectors

Integrations

Guardion API endpoints

​Filtering and search

​Log details

​Providing feedback

​Video walkthrough

​Key terms

​Threats

​Flagged

​Confidence score

​Threshold

​Guardrails vs. Policies vs. Detectors

​Best practices

​Related resources

Feedbacks

Account Security Features

API Reference

Filtering and search

Log details

Providing feedback

Video walkthrough

Key terms

Threats

Flagged

Confidence score

Threshold

Guardrails vs. Policies vs. Detectors

Best practices

Related resources