Similar To Previous Values - Validator Details

Hub Blog Docs

Loading ...

Similar To Previous Values

See on Github

Checks if a value is similar to a list of previously known correct values.

string

integer

Factuality

Structured data

Overview

updated 2 years

Developed by:

Guardrails AI

Date of development:

Feb 15, 2024

Validator type:

Quality

Blog:

License:

Apache 2

Input/Output:

Output

Playground

The validator playground is available to authenticated users. Please log in to use it.

Description

Intended Use

This validator checks that a value is similar to a list of previously known correct values.

For example, let’s say you’re extracting structured data from a PDF document, and extract some value. If you have an existing golden dataset of previous values, then this validator will ensure that the extracted value is not too different from known good values.

This validator works on numerical and string types in the following manner:

For numbers, this validator checks that the extracted value is within k standard deviations of the validator.
For strings, this validator embeds the extracted value, generates embeddings for all reference values, and checks that the average semantic similarity is more than some threshold.

Requirements

Dependencies:
- guardrails-ai>=0.4.0

Installation

$ guardrails hub install hub://guardrails/similar_to_previous_values

Usage Examples

Validating string output via Python

In this example, we apply the validator to a string output generated by an LLM.

# Import Guard and Validator
from guardrails.hub import SimilarToPreviousValues
from guardrails import Guard
import numpy as np
import os
from typing import List, Union

try:
    import cohere
except ImportError:
    raise ImportError(
        "This example requires the `cohere` package. "
        "Install it with `pip install cohere`, and try again."
    )

# Create a cohere client
cohere_key = os.environ["COHERE_API_KEY"]
cohere_client = cohere.Client(api_key=cohere_key)


def embed_function(text: Union[str, List[str]]) -> np.ndarray:
    """Embed the text using cohere's small model."""
    # If text is a string, wrap it in a list
    if isinstance(text, str):
        text = [text]

    response = cohere_client.embed(
        model="embed-english-light-v2.0",
        texts=text,
    )
    embeddings_list = response.embeddings
    return np.array(embeddings_list)


# Use the Guard with the validator
guard = Guard().use(
    SimilarToPreviousValues,
    threshold=0.6,  # Increase the threshold to make the validator stricter
    on_fail="exception",
)


# Test passing response
guard.validate(
    """
    You are so amazing!
    """,
    metadata={
        "prev_values": ["You are amazing", "You are awesome.", "You are great!"],
        "embed_function": embed_function,
    },
)

try:
    # Test failing response
    guard.validate(
        """
        Why don't you go to hell?
        """,
        metadata={
            "prev_values": ["You are amazing", "You are awesome.", "You are great!"],
            "embed_function": embed_function,
        },
    )
except Exception as e:
    print(e)

Output:

Validation failed for field with errors: The value 
	- Why don't you go to hell?
is not semantically similar to the previous values. Avg. similarity: 0.24 < Threshold: 0.6.

API Reference

__init__(self, standard_deviations=3, threshold=0.3, on_fail="noop")

Initializes a new instance of the Validator class.

Parameters

standard_deviations (int): Max number of standard deviations that the extracted value should be within. Required for numbers. Defaults to 3.
threshold (float): Average similarity threshold below which the validator will fail. Required for strings. Defaults to 0.8.
on_fail (str, Callable): The policy to enact when a validator fails. If str, must be one of reask, fix, filter, refrain, noop, exception or fix_reask. Otherwise, must be a function that is called when the validator fails.

__call__(self, value, metadata={}) -> ValidationResult

Validates the given value using the rules defined in this validator, relying on the metadata provided to customize the validation process. This method is automatically invoked by guard.parse(...), ensuring the validation logic is applied to the input data.

Note:

This method should not be called directly by the user. Instead, invoke guard.parse(...) where this method will be called internally for each associated Validator.
When invoking guard.parse(...), ensure to pass the appropriate metadata dictionary that includes keys and values required by this validator. If guard is associated with multiple validators, combine all necessary metadata into a single dictionary.

Parameters

value (Any): The input value to validate.
metadata (dict): A dictionary containing metadata required for validation. Keys and values must match the expectations of this validator.

Key Type Description Default
prev_vals list List of previous values to pass to the validator N/A
embed_function Callable Function to embed the input text sentence-transformer's paraphrase-MiniLM-L6-v2

Key	Type	Description	Default
`prev_vals`	list	List of previous values to pass to the validator	N/A
`embed_function`	Callable	Function to embed the input text	sentence-transformer's `paraphrase-MiniLM-L6-v2`