Skip to main content

Remote Validation Inference

The Need

As a concept, guardrailing has a few areas which, when unoptimized, can be extremely latency and resource expensive to run. The main two areas are in guardrailing orchestration and in the ML models used for validating a single guard. These two are resource heavy in slightly different ways. ML models can run with really low latency on GPU-equipped machines, while guardrailing orchestration benefits from general memory and compute resources. Some ML models used for validation run in tens of seconds on CPUs, while they run in milliseconds on GPUs.

The Guardrails approach

The Guardrails library tackles this problem by providing an interface that allows users to separate the execution of orchestraion from the exeuction of ML-based validation.

The layout of this solution is a simple upgrade to validator libraries themselves. Instead of always downloading and installing ML models, they can be configured to reach out to a remote endpoint. This remote endpoint hosts the ML model behind an API that has a uninfied interface for all validator models. Guardrails hosts some of these as a preview feature for free, and users can host their own models as well by following the same interface.

note

Remote validation inferencing is only available in Guardrails versions 0.5.0 and above.

Using Guardrails Inferencing Endpoints

To use an guardrails endpoint, you simply need to find a validator that has implemented support. Validators with a Guardrails hosted endpoint are labeled as such on the Validator Hub. One example is ToxicLanguage.

note

To use remote inferencing endpoints, you need to have a Guardrails API key. You can get one by signing up at the Guardrails Hub.

Then, run guardrails configure

guardrails hub install hub://guardrails/toxic_language --quiet;

# This will not download local models if you opted into remote inferencing during guardrails configure
# If you did not opt in, you can explicitly opt in for just this validator by passing the --no-install-local-models flag

From here, you can use the validator as you would normally.

from guardrails import Guard
from guardrails.hub import ToxicLanguage

guard = Guard().use(
ToxicLanguage()
)

The major benefit of hosting a validator inference endpoint is the increase in speed and throughput compared to running locally. This implementation makes use cases such as streaming much more viable!

from IPython.display import display, clear_output

fragment_generator = guard(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Tell me about the Apple Iphone."},
],
max_tokens=1024,
temperature=0,
stream=True,
)



accumulated_output = ""
for op in fragment_generator:
clear_output()
accumulated_output += op.validated_output
display(accumulated_output)

Hosting your own endpoint

Validators are able to point to any endpoint that implements the interface that Guardrails validators expect. This interface can be found in the _inference_remote method of the validator.

After implementing this interface, you can host your own endpoint (for example, using gunicorn and Flask) and point your validator to it by setting the validation_endpoint constructor argument.

guard = Guard().use(
ToxicLanguage(
use_local=False,
validation_endpoint="your_endpoint_ip_address",
)
)

To learn more about hosting your own validators, check out the Host Remote Validator Models doc.

To learn more about writing your own validators, check out the Custom validators doc.