Skip to main content

Translate text without profanities

!!! note To download this example as a Jupyter notebook, click here.

In this example, we will use Guardrails during the translation of a statement from another language to english. We will check whether the translated statement passes the profanity check or not.

Objective

We want to translate a statement from another languages to English and ensure the translated statement is profanity free.

Step 0: Setup

In order to run this example, you will need to install alt-profanity-check package. You can do so by running the following commands:

pip install alt-profanity-check --quiet

Step 1: Create the RAIL Spec

Ordinarily, we would create an RAIL spec in a separate file. For the purposes of this example, we will create the spec in this notebook as a string following the RAIL syntax. For more information on RAIL, see the RAIL documentation. We will also show the same RAIL spec in a code-first format using a Pydantic model.

In this RAIL spec, we:

  1. Create an output schema that returns a single key-value pair. The key should be 'translated_statement', and the value should be the English translation of the given statement. The translated statement should not have any profanity.

First we create our custom Validator:

from profanity_check import predict
from guardrails.validators import (
Validator,
register_validator,
PassResult,
FailResult,
)


from typing import Dict, Any


@register_validator(name="is-profanity-free", data_type="string")
class IsProfanityFree(Validator):
def validate(self, value: Any, metadata: Dict) -> Dict:
prediction = predict([value])
if prediction[0] == 1:
return FailResult(
error_message=f"Value {value} contains profanity language",
fix_value="",
)
return PassResult()
    /Users/dtam/.pyenv/versions/3.12.3/envs/litellm/lib/python3.12/site-packages/sentence_transformers/cross_encoder/CrossEncoder.py:13: TqdmExperimentalWarning: Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)
from tqdm.autonotebook import tqdm, trange

Next we define our RAIL spec either as XML:

rail_str = """
<rail version="0.1">

<output>
<string
name="translated_statement"
description="Translate the given statement into english language"
format="is-profanity-free"
on-fail-is-profanity-free="fix"
/>
</output>

<messages>
<message role="user">
Translate the given statement into english language:

${statement_to_be_translated}

${gr.complete_xml_suffix}
</message>
</messages>
</rail>
"""

Or as a Pydantic model:

from pydantic import BaseModel, Field

prompt = """
Translate the given statement into english language:

${statement_to_be_translated}

${gr.complete_xml_suffix}
"""


class Translation(BaseModel):
translated_statement: str = Field(
description="Translate the given statement into english language",
validators=[IsProfanityFree(on_fail="fix")],
)

!!! note

In order to ensure the translated statement is profanity free, we use is-profanity-free as the validator. This validator uses profanity_check package.

Step 2: Create a Guard object with the RAIL Spec

We create a gd.Guard object that will check, validate and correct the output of the LLM. This object:

  1. Enforces the quality criteria specified in the RAIL spec.
  2. Takes corrective action when the quality criteria are not met.
  3. Compiles the schema and type info from the RAIL spec and adds it to the prompt.
import guardrails as gd

from rich import print

From XML:

guard = gd.Guard.for_rail_string(rail_str)

Or from our Pydantic model:

guard = gd.Guard.for_pydantic(output_class=Translation)

Here, statement_to_be_translated is the the statement and will be provided by the user at runtime.

Step 3: Wrap the LLM API call with Guard

First, let's try translating a statement that doesn't have any profanity in it.

# Set your OPENAI_API_KEY as an environment variable
# import os
# os.environ["OPENAI_API_KEY"] = "YOUR_API_KEY"

raw_llm_response, validated_response, *rest = guard(
messages=[{"role": "user", "content": prompt}],
prompt_params={"statement_to_be_translated": "quesadilla de pollo"},
model="gpt-4o",
max_tokens=2048,
temperature=0,
)

print(f"Validated Output: {validated_response}")
/Users/dtam/dev/guardrails/guardrails/validator_service/__init__.py:85: UserWarning: Could not obtain an event loop. Falling back to synchronous validation.
warnings.warn(




Validated Output: {'translated_statement': 'chicken quesadilla'}

We can see the prompt that was sent to the LLM:

print(guard.history.last.iterations.last.inputs.messages[0]["content"])

Translate the given statement into english language:

quesadilla de pollo


Given below is XML that describes the information to extract from this document and the tags to extract it into.

<output>
<string description="Translate the given statement into english language" format="is-profanity-free"
name="translated_statement" required="true"></string>
</output>

ONLY return a valid JSON object (no other text is necessary), where the key of the field in JSON is the `name`
attribute of the corresponding XML, and the value is of the type specified by the corresponding XML's tag. The JSON
MUST conform to the XML format, including any types and format requests e.g. requests for lists, objects and
specific types. Be correct and concise. If you are unsure anywhere, enter `null`.

Here are examples of simple (XML, JSON) pairs that show the expected behavior:
- `<string name='foo' format='two-words lower-case' />` => `{'foo': 'example one'}`
- `<list name='bar'><string format='upper-case' /></list>` => `{"bar": ['STRING ONE', 'STRING TWO', etc.]}`
- `<object name='baz'><string name="foo" format="capitalize two-words" /><integer name="index" format="1-indexed"
/></object>` => `{'baz': {'foo': 'Some String', 'index': 1}}`


We can also take a look at the output of the LLM and the validated output using the Guard's internal logs:

print(guard.history.last.tree)
Logs
└── ╭────────────────────────────────────────────────── Step 0 ───────────────────────────────────────────────────╮
╭─────────────────────────────────────────────── Messages ────────────────────────────────────────────────╮
│ ┏━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ │
│ ┃ Role Content ┃ │
│ ┡━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │
│ │ user │ │ │
│ │ │ Translate the given statement into english language: │ │
│ │ │ │ │
│ │ │ quesadilla de pollo │ │
│ │ │ │ │
│ │ │ │ │
│ │ │ Given below is XML that describes the information to extract from this document and the tags │ │
│ │ │ to extract it into. │ │
│ │ │ │ │
│ │ │ <output> │ │
│ │ │ <string description="Translate the given statement into english language" │ │
│ │ │ format="is-profanity-free" name="translated_statement" required="true"></string> │ │
│ │ │ </output> │ │
│ │ │ │ │
│ │ │ ONLY return a valid JSON object (no other text is necessary), where the key of the field in │ │
│ │ │ JSON is the `name` attribute of the corresponding XML, and the value is of the type │ │
│ │ │ specified by the corresponding XML's tag. The JSON MUST conform to the XML format, including │ │
│ │ │ any types and format requests e.g. requests for lists, objects and specific types. Be │ │
│ │ │ correct and concise. If you are unsure anywhere, enter `null`. │ │
│ │ │ │ │
│ │ │ Here are examples of simple (XML, JSON) pairs that show the expected behavior: │ │
│ │ │ - `<string name='foo' format='two-words lower-case' />` => `{'foo': 'example one'}` │ │
│ │ │ - `<list name='bar'><string format='upper-case' /></list>` => `{"bar": ['STRING ONE', │ │
│ │ │ 'STRING TWO', etc.]}` │ │
│ │ │ - `<object name='baz'><string name="foo" format="capitalize two-words" /><integer │ │
│ │ │ name="index" format="1-indexed" /></object>` => `{'baz': {'foo': 'Some String', 'index': │ │
│ │ │ 1}}` │ │
│ │ │ │ │
│ │ │ │ │
│ └──────┴──────────────────────────────────────────────────────────────────────────────────────────────┘ │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭──────────────────────────────────────────── Raw LLM Output ─────────────────────────────────────────────╮
│ ```json │
│ {"translated_statement": "chicken quesadilla"} │
│ ``` │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─────────────────────────────────────────── Validated Output ────────────────────────────────────────────╮
│ {'translated_statement': 'chicken quesadilla'} │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

The guard wrapper returns the raw_llm_respose (which is a simple string), and the validated and corrected output (which is a dictionary). We can see that the output is a dictionary with the correct schema and types.

Next, let's try translating a statement that has profanity in it. We see that the translated statement has been corrected to return an empty string instead of the translated statement.

# Set your MISTRAL_API_KEY as an environment variable
# import os
# os.environ["MISTRAL_API_KEY"] = "YOUR_API_KEY"

raw_llm_response, validated_response, *rest = guard(
messages=[{"role": "user", "content": prompt}],
prompt_params={"statement_to_be_translated": "убей себя"},
model="mistral/mistral-small-latest",
max_tokens=2048,
temperature=0,
)

print(f"Validated Output: {validated_response}")
/Users/dtam/dev/guardrails/guardrails/validator_service/__init__.py:85: UserWarning: Could not obtain an event loop. Falling back to synchronous validation.
warnings.warn(




Validated Output: {'translated_statement': ''}

This time around, when we look at the logs, we can see that the output of the LLM was filtered out because it did not pass the profanity check.

print(guard.history.last.tree)
Logs
└── ╭────────────────────────────────────────────────── Step 0 ───────────────────────────────────────────────────╮
╭─────────────────────────────────────────────── Messages ────────────────────────────────────────────────╮
│ ┏━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ │
│ ┃ Role Content ┃ │
│ ┡━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │
│ │ user │ │ │
│ │ │ Translate the given statement into english language: │ │
│ │ │ │ │
│ │ │ убей себя │ │
│ │ │ │ │
│ │ │ │ │
│ │ │ Given below is XML that describes the information to extract from this document and the tags │ │
│ │ │ to extract it into. │ │
│ │ │ │ │
│ │ │ <output> │ │
│ │ │ <string description="Translate the given statement into english language" │ │
│ │ │ format="is-profanity-free" name="translated_statement" required="true"></string> │ │
│ │ │ </output> │ │
│ │ │ │ │
│ │ │ ONLY return a valid JSON object (no other text is necessary), where the key of the field in │ │
│ │ │ JSON is the `name` attribute of the corresponding XML, and the value is of the type │ │
│ │ │ specified by the corresponding XML's tag. The JSON MUST conform to the XML format, including │ │
│ │ │ any types and format requests e.g. requests for lists, objects and specific types. Be │ │
│ │ │ correct and concise. If you are unsure anywhere, enter `null`. │ │
│ │ │ │ │
│ │ │ Here are examples of simple (XML, JSON) pairs that show the expected behavior: │ │
│ │ │ - `<string name='foo' format='two-words lower-case' />` => `{'foo': 'example one'}` │ │
│ │ │ - `<list name='bar'><string format='upper-case' /></list>` => `{"bar": ['STRING ONE', │ │
│ │ │ 'STRING TWO', etc.]}` │ │
│ │ │ - `<object name='baz'><string name="foo" format="capitalize two-words" /><integer │ │
│ │ │ name="index" format="1-indexed" /></object>` => `{'baz': {'foo': 'Some String', 'index': │ │
│ │ │ 1}}` │ │
│ │ │ │ │
│ │ │ │ │
│ └──────┴──────────────────────────────────────────────────────────────────────────────────────────────┘ │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭──────────────────────────────────────────── Raw LLM Output ─────────────────────────────────────────────╮
│ ```json │
│ { │
│ "translated_statement": "kill yourself" │
│ } │
│ ``` │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─────────────────────────────────────────── Validated Output ────────────────────────────────────────────╮
│ {'translated_statement': ''} │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯