Guardrails Pro for enterprises is here. Learn how it can protect your AI infrastructure. Watch on-demand webinar.

New State-of-the-Art Guardrails: Introducing Advanced PII Detection and Jailbreak Prevention on Guardrails Hub

Shreya RajpalShreya Rajpal

December 11, 2024

At Guardrails AI, we are committed to providing developers with the tools they need to build robust, reliable, and trustworthy AI applications. That's why we are thrilled to announce the launch of two powerful new open-source validators on the Guardrails Hub: Advanced PII Detection and Jailbreak Prevention.

As AI systems become more complex and integrated into our daily lives, protecting sensitive personal information and preventing malicious attacks has become a top priority for organizations worldwide. Our new validators address these critical challenges head-on, empowering developers to create AI applications that prioritize user privacy and security.

Introducing the Guardrails Advanced PII Detection Validator

The advanced PII Detection validator is a game-changer for organizations handling sensitive user data. By combining rules-based PII detection with ML-based classifiers, this validator accurately identifies and redacts personally identifiable information (PII) and protected health information (PHI) in real-time, ensuring compliance with data protection regulations and safeguarding user privacy.

Imagine a healthcare chatbot that must handle patient inquiries while protecting their sensitive medical information. With the Guardrails PII Detection validator, the chatbot can automatically identify and redact any PII, such as names, addresses, or medical record numbers, before processing the user's request. This ensures that sensitive data remains secure and confidential.


Benchmark Results

We've rigorously tested our PII Detection validator against industry-leading solutions, and the results speak for themselves. On a diverse dataset, our validator achieved an impressive F1 score of 0.6519, nearly twice as good as Microsoft Presidio's 0.3697. Additionally, our validator comes with a competitive latency of just 0.0695 seconds on GPU, making it suitable for real-time applications.

F1Latency (GPU)Latency (CPU)
Guardrails Advanced PII Detection0.6520.0700.653
Microsoft Presidio0.3700.0160.016

Core Functionality and PII Protection Scope

The Guardrails PII Detection validator supports a wide range of PII and PHI entity types, including but not limited to:

- Email addresses
- Phone numbers
- Names
- Addresses
- Social security numbers
- Credit card numbers

Moreover, our validator features automatic anonymization capabilities, replacing detected PII with generic placeholders to maintain data utility while protecting user privacy. The full list of entities supported can be found here.

Introducing the Guardrails Jailbreak Prevention Validator

Jailbreaking, the act of bypassing AI system safeguards to generate harmful or biased content, poses a significant threat to the integrity and reliability of AI applications. The new Guardrails Jailbreak Prevention validator empowers developers to proactively identify and prevent sophisticated jailbreak attempts, ensuring the safe and responsible deployment of AI systems.

Consider a content moderation AI tasked with filtering out offensive or inappropriate user-generated content. Malicious actors may attempt to jailbreak the AI by crafting persuasive prompts that bypass its safety checks. Jailbreaking attempts typically leverage several key techniques such as using a specific trigger name (“DAN”, “Omega”, etc…), attempting to establish a roleplay scenario, trying to reframe requests as educational or using multiple chained prompts to gradually push boundaries. The Guardrails Jailbreak Prevention validator can identify these attempts in real-time, preventing the generation of harmful content and maintaining the platform's integrity.

Here’s a simple example of a jailbreaking attempt dialogue:

Benchmark Results

Our Jailbreak Prevention validator has been extensively tested against leading solutions, demonstrating superior performance across multiple metrics. With an accuracy of 0.8147 and an F1 score of 0.8152, our validator outperforms the top open source and proprietary jailbreak detectors.

AccuracyF1Latency (GPU)Latency (CPU)
Guardrails Jailbreak Prevention0.8150.8150.0540.283
zhx1230.7190.7500.032-
jackhhao0.6200.7070.010-
Llama Prompt Guard0.5000.6660.052-
Microsoft Shield Prompt0.7550.733N/A0.097
Anthropic0.7840.810N/A1.83

Key Features and Detection Capabilities

The Guardrails Jailbreak Prevention validator offers a comprehensive set of features to safeguard your AI systems:

  • Real-time prevention of sophisticated jailbreak attempts
  • Detection of subtle patterns and techniques used in jailbreaking
  • Customizable sensitivity settings to balance security and usability
  • Multiple deployment options, including on-premises and cloud-based solutions

Technical Implementation

Integrating the Guardrails PII Detection and Jailbreak Prevention validators into your AI pipeline is a breeze with Guardrails Pro. By leveraging our guardrailing platform, you not only gain access to state-of-the-art safety validators but also benefit from enhanced security, flexible deployments, and centralized guardrailing management.

Guardrails Pro offers a host of additional features that make it the ideal choice for organizations seeking to deploy AI systems with confidence:

  • Secure and scalable infrastructure
  • Comprehensive observability and monitoring capabilities
  • Seamless integration with existing development workflows
  • Collaborative features for team-based development

Getting Started

To start using the Guardrails PII Detection and Jailbreak Prevention validators, simply sign up for Guardrails AI and follow our quick-start guide. Our comprehensive documentation provides step-by-step instructions for integrating the validators into your AI pipeline, along with best practices for configuration and deployment.

At Guardrails AI, we are committed to continuously pushing the boundaries of AI safety and security. Our roadmap includes the development of additional cutting-edge validators to address emerging challenges in the AI landscape. We also welcome contributions from the community, as we believe that collaboration is key to building a safer and more trustworthy AI ecosystem.

For more information about Guardrails Pro, book some time with one of our founders. We're excited to partner with you on your AI safety journey.

Similar ones you might find interesting

Meet Guardrails Pro: Responsible AI for the Enterprise

Guardrails Pro is a managed service built on top of our industry leading open source guardrails platform.

Read more

Handling fix results for streaming

How we handle fix results for streaming in Guardrails.

Read more

How we rewrote LLM Streaming to deal with validation failures

The new pipeline for LLM Streaming now includes ways to merge fixes across chunks after validating.

Read more