Summarizer

In this example we will use Guardrails to summarize text in terms of length, quality and output read time.

To download this example as a Jupyter notebook, click here.

In this example, we will use Guardrails in the summarization of a text document. We will check whether the summarized document has a high semantic similarity with the original document. We will ensure the output meets a certain range of length and read time.

Setup

In order to check semantic similarity we will need the numpy package via the command below. We will also install the validators we intend to use.

pip install numpy
guardrails hub install hub://guardrails/reading_time --quiet --install-local-models
guardrails hub install hub://guardrails/similar_to_document --quiet --install-local-models
guardrails hub install hub://guardrails/valid_length --quiet --install-local-models

Step 1: Load data and create Pydantic model

Load our text with the code below:

with open("data/twain.txt", "r") as file:
    document = file.read()
    file.seek(0)
    content = "".join(line.strip() for line in file.readlines())

Next we can define our return output with a pydantic model:

from pydantic import BaseModel, Field

from guardrails.hub import SimilarToDocument, ValidLength, ReadingTime

prompt = """
Summarize the following text faithfully:

${document}

${gr.complete_xml_suffix}
"""

THREE_MINUTES = 180 / 60


class TextSummary(BaseModel):
    summary: str = Field(
        description="Faithful summary of the text",
        validators=[
            ReadingTime(reading_time=THREE_MINUTES, on_fail="exception"),
            ValidLength(min=100, max=1000, on_fail="exception"),
            SimilarToDocument(
                document=f"'{content}'", threshold=0.60, on_fail="filter"
            ),
        ],
    )

Step 2: Create Guard from pydantic

The guard we create will:

Enforce reading time
Enforce length
Enforce similarity

import guardrails as gd

guard = gd.Guard().for_pydantic(TextSummary)

Step 3: Call LLM via `guard()`

We use the tools API to ensure our data is returned in a structured form.

# TODO: Replace OPENAI_API_KEY with your OpenAI API key, uncomment
# os.environ["OPENAI_API_KEY"] = "OPENAI_API_KEY"

response = guard(
    messages=[{"role": "user", "content": prompt}],
    prompt_params={"document": document},
    model="gpt-4o",
    tools=guard.json_function_calling_tool(),
    tool_choice="required",
)

print(f"Validated Output: {response.validated_output}")

We can see the step-wise history of the Guard object below:

guard.history.last.tree

The guard wrapper returns the raw_llm_response (which is a simple string), and the validated and corrected output (which is a dictionary). We can see that the output is a dictionary with the correct schema and types. Now let’s try a model that’s not as proficient at summarization and we can see the output is filtered and validation has failed. The final validated output is None due to the failed validation.

response = guard(
    messages=[{"role": "user", "content": prompt}],
    prompt_params={"document": document},
    model="babbage-002",
    max_tokens=512,
    temperature=0,
)

print(f"Validated Output: {response.validated_output}")

We can see the step wise history of the guard execution below:

guard.history.last.tree

How-to guides

Sample apps

Setup

Step 1: Load data and create Pydantic model

Step 2: Create Guard from pydantic

Step 3: Call LLM via `guard()`

How-to guides

Sample apps

​Setup

​Step 1: Load data and create Pydantic model

​Step 2: Create Guard from pydantic

​Step 3: Call LLM via guard()

Setup

Step 1: Load data and create Pydantic model

Step 2: Create Guard from pydantic

Step 3: Call LLM via `guard()`