Logs & Investigation
Monitor, analyze, and respond to AI system activity with comprehensive logging and feedback tools.
The Logs interface is your central hub for monitoring and investigating AI interactions. Each log entry contains detailed information about the request section, time, messages and any flags triggered during evaluation.
Filtering and search
Quickly find relevant requests with these filtering tools:
- Threat filters - Focus on specific threat categories or view all requests
- Full-text search - Find specific content within prompts or responses
- Time range - Narrow results to specific time periods (day, week, or custom)
- Confidence threshold - Adjust minimum confidence score to focus on high-probability threats (coming soon)
- Session ID - Track specific user sessions across multiple interactions
Use the search function to look for specific keywords that might indicate security concerns, such as “password,” “API key,” or “database access.”
Log details
Click on any log entry to view comprehensive information about the interaction:
Each log detail view includes:
- Complete conversation history - See the full context of the interaction
- Detection metadata - View confidence scores and specific policies triggered
- Source information - Identify the origin with session ID and client metadata
- Feedback controls - Provide input to improve detection accuracy
For interactions that passed all security checks, you’ll see a simplified view:
Providing feedback
When reviewing logs, you can help improve Guardion’s detection accuracy by providing feedback on any misclassifications you find:
- Open the log detail view for the interaction
- Review the detection results and conversation context
- If you identify a false positive or false negative, click Mark as Misclassification
- Your feedback is immediately incorporated into the detection system
Your feedback helps build a dataset specific to your policy, making Guardion’s runtime control more accurate over time. Learn more in our Feedbacks documentation.
Video walkthrough
Watch our detailed walkthrough to see the investigation tools in action:
Key terms
Understanding the following terms will help you effectively our platform:
Threats
Threats are specific types of risks that Guardion detects, such as prompt injections, jailbreaks, or harmful content. Each threat type has its own detector and can be configured as part of your policies. The logs interface shows which threats were detected in each interaction.
Flagged
When content triggers one of your policies, it gets “flagged” in the system, meaning a risk has been identified. Flagged content appears in your logs with detailed information about which policies were triggered and why. This visibility gives you a clear audit trail to quickly identify, investigate, and remediate potential security and compliance issues across your AI interactions.
Confidence score
For each detected threat, Guardion provides a confidence score (0 to 1) indicating how certain the system is about the classification. Higher scores represent greater certainty that a real threat exists.
Threshold
Thresholds are configurable values that determine when a detection triggers a flag. You can adjust thresholds for each policy to balance control and usability.
- L1 (Lenient): Provides basic protection with minimal false positives, offering a balance that favors user experience over strict control
- L2 (Moderate): Balanced approach with moderate protection and acceptable false positive rates
- L3 (Enhanced): Stronger protection with potentially more false positives, prioritizing security over perfect accuracy
- L4 (Strict): Maximum protection level with potentially higher false positive rates but ensures comprehensive coverage against potential threats
Guardrails vs. Policies vs. Detectors
- Guardrails are the protective boundaries you establish around your AI systems. They’re implemented through policies and help ensure your AI behaves according to your requirements and security standards.
- Policies are the rules you configure that determine how Guardion should handle different types of content. A policy defines which detectors to use, what thresholds and where target to apply.
- Detectors are the specific mechanisms that identify particular types of threats. Each detector is specialized for a certain category of risk (e.g., prompt injection detector, harmful content detector, PII detector, code generation detector, etc). Policies use one or more detectors with configured thresholds.
Best practices
- Regular monitoring: Schedule time to review logs daily or weekly depending on your traffic volume
- Investigate patterns: Look for repeated attempts that might indicate targeted attacks
- Provide feedback: Mark misclassifications to continuously improve detection accuracy
- Set up alerts: Configure notifications for high-confidence threats (coming soon)
- Export data: Use the API to integrate log data with your existing security tools (coming soon)