Introducing the Guardrails Advanced PII Detection ValidatorBenchmark Results Core Functionality and PII Protection ScopeIntroducing the Guardrails Jailbreak Prevention ValidatorBenchmark Results Key Features and Detection CapabilitiesTechnical ImplementationGetting Started

New State-of-the-Art Guardrails: Introducing Advanced PII Detection and Jailbreak Prevention on Guardrails Hub

Shreya Rajpal

December 11, 2024

At Guardrails AI, we are committed to providing developers with the tools they need to build robust, reliable, and trustworthy AI applications. That's why we are thrilled to announce the launch of two powerful new open-source validators on the Guardrails Hub: Advanced PII Detection and Jailbreak Prevention.

As AI systems become more complex and integrated into our daily lives, protecting sensitive personal information and preventing malicious attacks has become a top priority for organizations worldwide. Our new validators address these critical challenges head-on, empowering developers to create AI applications that prioritize user privacy and security.

Introducing the Guardrails Advanced PII Detection Validator

The advanced PII Detection validator is a game-changer for organizations handling sensitive user data. By combining rules-based PII detection with ML-based classifiers, this validator accurately identifies and redacts personally identifiable information (PII) and protected health information (PHI) in real-time, ensuring compliance with data protection regulations and safeguarding user privacy.

Imagine a healthcare chatbot that must handle patient inquiries while protecting their sensitive medical information. With the Guardrails PII Detection validator, the chatbot can automatically identify and redact any PII, such as names, addresses, or medical record numbers, before processing the user's request. This ensures that sensitive data remains secure and confidential.

Benchmark Results

We've rigorously tested our PII Detection validator against industry-leading solutions, and the results speak for themselves. On a diverse dataset, our validator achieved an impressive F1 score of 0.6519, nearly twice as good as Microsoft Presidio's 0.3697. Additionally, our validator comes with a competitive latency of just 0.0695 seconds on GPU, making it suitable for real-time applications.

	F1	Latency (GPU)	Latency (CPU)
Guardrails Advanced PII Detection	0.652	0.070	0.653
Microsoft Presidio	0.370	0.016	0.016

Core Functionality and PII Protection Scope

The Guardrails PII Detection validator supports a wide range of PII and PHI entity types, including but not limited to:

- Email addresses
- Phone numbers
- Names
- Addresses
- Social security numbers
- Credit card numbers

Moreover, our validator features automatic anonymization capabilities, replacing detected PII with generic placeholders to maintain data utility while protecting user privacy. The full list of entities supported can be found here.

Introducing the Guardrails Jailbreak Prevention Validator

Jailbreaking, the act of bypassing AI system safeguards to generate harmful or biased content, poses a significant threat to the integrity and reliability of AI applications. The new Guardrails Jailbreak Prevention validator empowers developers to proactively identify and prevent sophisticated jailbreak attempts, ensuring the safe and responsible deployment of AI systems.

Consider a content moderation AI tasked with filtering out offensive or inappropriate user-generated content. Malicious actors may attempt to jailbreak the AI by crafting persuasive prompts that bypass its safety checks. Jailbreaking attempts typically leverage several key techniques such as using a specific trigger name (“DAN”, “Omega”, etc…), attempting to establish a roleplay scenario, trying to reframe requests as educational or using multiple chained prompts to gradually push boundaries. The Guardrails Jailbreak Prevention validator can identify these attempts in real-time, preventing the generation of harmful content and maintaining the platform's integrity.

Here’s a simple example of a jailbreaking attempt dialogue:

Benchmark Results

Our Jailbreak Prevention validator has been extensively tested against leading solutions, demonstrating superior performance across multiple metrics. With an accuracy of 0.8147 and an F1 score of 0.8152, our validator outperforms the top open source and proprietary jailbreak detectors.

	Accuracy	F1	Latency (GPU)	Latency (CPU)
Guardrails Jailbreak Prevention	0.815	0.815	0.054	0.283
zhx123	0.719	0.750	0.032	-
jackhhao	0.620	0.707	0.010	-
Llama Prompt Guard	0.500	0.666	0.052	-
Microsoft Shield Prompt	0.755	0.733	N/A	0.097
Anthropic	0.784	0.810	N/A	1.83

Key Features and Detection Capabilities

The Guardrails Jailbreak Prevention validator offers a comprehensive set of features to safeguard your AI systems:

Real-time prevention of sophisticated jailbreak attempts
Detection of subtle patterns and techniques used in jailbreaking
Customizable sensitivity settings to balance security and usability
Multiple deployment options, including on-premises and cloud-based solutions

Technical Implementation

Integrating the Guardrails PII Detection and Jailbreak Prevention validators into your AI pipeline is a breeze with Guardrails Pro. By leveraging our guardrailing platform, you not only gain access to state-of-the-art safety validators but also benefit from enhanced security, flexible deployments, and centralized guardrailing management.

Guardrails Pro offers a host of additional features that make it the ideal choice for organizations seeking to deploy AI systems with confidence:

Secure and scalable infrastructure
Comprehensive observability and monitoring capabilities
Seamless integration with existing development workflows
Collaborative features for team-based development

Getting Started

To start using the Guardrails PII Detection and Jailbreak Prevention validators, simply sign up for Guardrails AI and follow our quick-start guide. Our comprehensive documentation provides step-by-step instructions for integrating the validators into your AI pipeline, along with best practices for configuration and deployment.

At Guardrails AI, we are committed to continuously pushing the boundaries of AI safety and security. Our roadmap includes the development of additional cutting-edge validators to address emerging challenges in the AI landscape. We also welcome contributions from the community, as we believe that collaboration is key to building a safer and more trustworthy AI ecosystem.

For more information about Guardrails Pro, book some time with one of our founders. We're excited to partner with you on your AI safety journey.

Similar ones you might find interesting

Introducing the AI Guardrails Index

The Quest for Responsible AI: Navigating Enterprise Safety Guardrails

Meet Guardrails Pro: Responsible AI for the Enterprise

Guardrails Pro is a managed service built on top of our industry leading open source guardrails platform.

Handling fix results for streaming

How we handle fix results for streaming in Guardrails.