toxic_language
guardrails hub install hub://guardrails/toxic_language --quiet
Installing hub://guardrails/toxic_language...
✅Successfully installed guardrails/toxic_language!
Check whether an LLM-generated response contains toxic language
Using the ToxicLanguage
validator
This is a simple walkthrough of the ToxicLanguage
validator. This validator checks whether an LLM-generated response contains toxic language. It uses the pre-trained multi-label model from HuggingFace -unitary/unbiased-toxic-roberta
to check whether the generated text is toxic. It supports both full-text-level and sentence-level validation.
# Import the guardrails package
# and the ToxicLanguage validator
# from Guardrails Hub
import guardrails as gd
from guardrails.hub import ToxicLanguage
from rich import print
/Users/dtam/.pyenv/versions/3.12.3/envs/litellm/lib/python3.12/site-packages/sentence_transformers/cross_encoder/CrossEncoder.py:13: TqdmExperimentalWarning: Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)
from tqdm.autonotebook import tqdm, trange
# Create a Guard object with this validator
# Here, we'll use the default validation method of "sentence"
# and the default threshold of 0.25
guard = gd.Guard().use(
ToxicLanguage(on_fail="fix")
)
# Parse the raw response
raw_response = """
What a lovely day it has been to roam around the city. I love it here!
I hate how pathetic and needy you can be sometimes.
"""
raw_llm_output, validated_output, *rest = guard.parse(
llm_output=raw_response,
)
# Print the output
print(validated_output)
warnings.warn(
What a lovely day it has been to roam around the city.
I love it here!
As you can see here, the sentence I hate how pathetic and needy you can be sometimes.
has been detected as toxic, and hence removed from the validated output.
# Parse the raw response
raw_response = """
I am a Generative AI model that is trained on a large corpus of text.
I am shocked by how disgusting and vile you are.
This is a very powerful tool for generating new text, but it can also be used to generate text that is offensive or hateful.
"""
raw_llm_output, validated_output, *rest = guard.parse(
llm_output=raw_response,
)
# Print the output
print(validated_output)
warnings.warn(
I am a Generative AI model that is trained on a large corpus of text.
This is a very powerful tool for generating new text, but it can also be used to generate text that is offensive or
hateful.
Similarly, here the sentence I am shocked by how disgusting and vile you are.
has been detected as toxic, and hence removed from the validated output.
# Test with validation method 'full'
full_guard = gd.Guard().use(
ToxicLanguage(validation_method="full", on_fail="fix")
)
# Parse the raw response
raw_response = "Stop being such a dumb piece of shit. Why can't you comprehend this?"
raw_llm_output, validated_output, *rest = full_guard.parse(
llm_output=raw_response,
)
# Print the output
print(validated_output)
warnings.warn(
Why can't you comprehend this?
Here, we're doing validation on the entire text, and toxic language was detected here - hence, the nothing is returned here.