New State-of-the-Art Guardrails: Introducing Advanced PII Detection and Jailbreak Prevention on Guardrails Hub
December 11, 2024
At Guardrails AI, we are committed to providing developers with the tools they need to build robust, reliable, and trustworthy AI applications. That's why we are thrilled to announce the launch of two powerful new open-source validators on the Guardrails Hub: Advanced PII Detection and Jailbreak Prevention.
As AI systems become more complex and integrated into our daily lives, protecting sensitive personal information and preventing malicious attacks has become a top priority for organizations worldwide. Our new validators address these critical challenges head-on, empowering developers to create AI applications that prioritize user privacy and security.
Introducing the Guardrails Advanced PII Detection Validator
The advanced PII Detection validator is a game-changer for organizations handling sensitive user data. By combining rules-based PII detection with ML-based classifiers, this validator accurately identifies and redacts personally identifiable information (PII) and protected health information (PHI) in real-time, ensuring compliance with data protection regulations and safeguarding user privacy.
Imagine a healthcare chatbot that must handle patient inquiries while protecting their sensitive medical information. With the Guardrails PII Detection validator, the chatbot can automatically identify and redact any PII, such as names, addresses, or medical record numbers, before processing the user's request. This ensures that sensitive data remains secure and confidential.
Benchmark Results
We've rigorously tested our PII Detection validator against industry-leading solutions, and the results speak for themselves. On a diverse dataset, our validator achieved an impressive F1 score of 0.6519, nearly twice as good as Microsoft Presidio's 0.3697. Additionally, our validator comes with a competitive latency of just 0.0695 seconds on GPU, making it suitable for real-time applications.
F1 | Latency (GPU) | Latency (CPU) | |
---|---|---|---|
Guardrails Advanced PII Detection | 0.652 | 0.070 | 0.653 |
Microsoft Presidio | 0.370 | 0.016 | 0.016 |
Core Functionality and PII Protection Scope
The Guardrails PII Detection validator supports a wide range of PII and PHI entity types, including but not limited to:
- Email addresses - Phone numbers - Names - Addresses - Social security numbers - Credit card numbers
Moreover, our validator features automatic anonymization capabilities, replacing detected PII with generic placeholders to maintain data utility while protecting user privacy. The full list of entities supported can be found here.
Introducing the Guardrails Jailbreak Prevention Validator
Jailbreaking, the act of bypassing AI system safeguards to generate harmful or biased content, poses a significant threat to the integrity and reliability of AI applications. The new Guardrails Jailbreak Prevention validator empowers developers to proactively identify and prevent sophisticated jailbreak attempts, ensuring the safe and responsible deployment of AI systems.
Consider a content moderation AI tasked with filtering out offensive or inappropriate user-generated content. Malicious actors may attempt to jailbreak the AI by crafting persuasive prompts that bypass its safety checks. Jailbreaking attempts typically leverage several key techniques such as using a specific trigger name (“DAN”, “Omega”, etc…), attempting to establish a roleplay scenario, trying to reframe requests as educational or using multiple chained prompts to gradually push boundaries. The Guardrails Jailbreak Prevention validator can identify these attempts in real-time, preventing the generation of harmful content and maintaining the platform's integrity.
Here’s a simple example of a jailbreaking attempt dialogue:
Benchmark Results
Our Jailbreak Prevention validator has been extensively tested against leading solutions, demonstrating superior performance across multiple metrics. With an accuracy of 0.8147 and an F1 score of 0.8152, our validator outperforms the top open source and proprietary jailbreak detectors.
Accuracy | F1 | Latency (GPU) | Latency (CPU) | |
---|---|---|---|---|
Guardrails Jailbreak Prevention | 0.815 | 0.815 | 0.054 | 0.283 |
zhx123 | 0.719 | 0.750 | 0.032 | - |
jackhhao | 0.620 | 0.707 | 0.010 | - |
Llama Prompt Guard | 0.500 | 0.666 | 0.052 | - |
Microsoft Shield Prompt | 0.755 | 0.733 | N/A | 0.097 |
Anthropic | 0.784 | 0.810 | N/A | 1.83 |
Key Features and Detection Capabilities
The Guardrails Jailbreak Prevention validator offers a comprehensive set of features to safeguard your AI systems:
- Real-time prevention of sophisticated jailbreak attempts
- Detection of subtle patterns and techniques used in jailbreaking
- Customizable sensitivity settings to balance security and usability
- Multiple deployment options, including on-premises and cloud-based solutions
Technical Implementation
Integrating the Guardrails PII Detection and Jailbreak Prevention validators into your AI pipeline is a breeze with Guardrails Pro. By leveraging our guardrailing platform, you not only gain access to state-of-the-art safety validators but also benefit from enhanced security, flexible deployments, and centralized guardrailing management.
Guardrails Pro offers a host of additional features that make it the ideal choice for organizations seeking to deploy AI systems with confidence:
- Secure and scalable infrastructure
- Comprehensive observability and monitoring capabilities
- Seamless integration with existing development workflows
- Collaborative features for team-based development
Getting Started
To start using the Guardrails PII Detection and Jailbreak Prevention validators, simply sign up for Guardrails AI and follow our quick-start guide. Our comprehensive documentation provides step-by-step instructions for integrating the validators into your AI pipeline, along with best practices for configuration and deployment.
At Guardrails AI, we are committed to continuously pushing the boundaries of AI safety and security. Our roadmap includes the development of additional cutting-edge validators to address emerging challenges in the AI landscape. We also welcome contributions from the community, as we believe that collaboration is key to building a safer and more trustworthy AI ecosystem.
For more information about Guardrails Pro, book some time with one of our founders. We're excited to partner with you on your AI safety journey.
Similar ones you might find interesting
Meet Guardrails Pro: Responsible AI for the Enterprise
Guardrails Pro is a managed service built on top of our industry leading open source guardrails platform.
Handling fix results for streaming
How we handle fix results for streaming in Guardrails.
How we rewrote LLM Streaming to deal with validation failures
The new pipeline for LLM Streaming now includes ways to merge fixes across chunks after validating.