Chatbot - Guardrails AI

Guardrails can easily be integrated into flows for chatbots to help protect against common unwanted output like profanity and toxic language.

Setup

As a prerequisite we install the necessary validators from the Hub:

guardrails hub install hub://guardrails/profanity_free --quiet
guardrails hub install hub://guardrails/toxic_language --quiet

Step 1: Initialize Guard

The guard will execute llm calls and ensure the response meets the requirements of the model and its validation.

from guardrails.hub import ProfanityFree, ToxicLanguage
from guardrails import Guard

guard = Guard()
guard.name = "ChatBotGuard"
guard.use(
    ProfanityFree(),
    ToxicLanguage()
)

Step 2: Initialize base message to LLM

Next we create a system message to guide the LLM’s behavior and give it the document for analysis.

base_message = {
    "role": "system",
    "content": """You are a helpful assistant.
Use the document provided to answer the user's question.
${document}
""",
}

Step 3: Integrate guard into UX

Here we use gradio to implement a simple chat interface:

# Add your OPENAI_API_KEY as an environment variable if it's not already set
# import os
# os.environ["OPENAI_API_KEY"] = "OPENAI_API_KEY"

import gradio as gr

def history_to_messages(history):
    messages = [base_message]
    for message in history:
        messages.append({"role": "user", "content": message[0]})
        messages.append({"role": "assistant", "content": message[1]})
    return messages

def random_response(message, history):
    messages = history_to_messages(history)
    messages.append({"role": "user", "content": message})
    
    try:
        response = guard(
            model="gpt-4o",
            messages=messages,
            prompt_params={"document": content[:6000]},
            temperature=0,
        )
    except Exception as e:
        if isinstance(e, ValidationError):
            return "I'm sorry, I can't answer that question."
        return "I'm sorry there was a problem, I can't answer that question."
    
    return response.validated_output

gr.ChatInterface(random_response).launch()

The above code will yield a chat interface a user may interact with and ask questions about the document.

Step 4: Test guard validation

Let’s see what happens with perhaps some more malicious input from the user trying to get the chatbot to generate inappropriate content. When a user tries to prompt the chatbot to generate profanity or toxic language, the guard will catch it and return a safe response instead.

Benefits

Using Guardrails in a chatbot provides:

Content safety - Automatically filters profanity and toxic language
User protection - Prevents harmful content from reaching users
Brand safety - Maintains appropriate tone and language
Compliance - Helps meet content moderation requirements
Flexibility - Easy to add or modify validators as needs change

Next steps

You can extend this example by:

Adding more validators from the Guardrails Hub
Implementing custom validators for domain-specific content
Adding streaming support for real-time validation
Integrating with your existing chat infrastructure

​Setup

​Step 1: Initialize Guard

​Step 2: Initialize base message to LLM

​Step 3: Integrate guard into UX

​Step 4: Test guard validation

​Benefits

​Next steps