Remote Validation Inference
The problem
As a concept, guardrailing has a few areas that, when unoptimized, can introduce latency and be extremely resource-expensive. The main two areas are:
- Guardrailing orchestration; and
- ML models that validate a single guard
These are resource-heavy in slightly different ways. ML models can run with low latency on GPU-equipped machines. (Some ML models used for validation run in tens of seconds on CPUs, while they run in milliseconds on GPUs.) Meanwhile, guardrailing orchestration benefits from general memory and compute resources.
The Guardrails approach
The Guardrails library tackles this problem by providing an interface that allows users to separate the execution of orchestration from the execution of ML-based validation.
The layout of this solution is a simple upgrade to validator libraries themselves. Instead of always downloading and installing ML models, you can configure them to call a remote endpoint. This remote endpoint hosts the ML model behind an API that presents a unified interface for all validator models.
Guardrails hosts some of these for free as a preview feature. Users can host their own models by following the same interface.
Remote validation inferencing is only available in Guardrails versions 0.5.0 and above.
Using Guardrails inferencing endpoints
To use a guardrails endpoint, find a validator that has implemented support. Validators with a Guardrails-hosted endpoint are labeled as such on the Validator Hub. One example is Toxic Language.
To use remote inferencing endpoints, you need a Guardrails API key. You can get one by signing up at the Guardrails Hub.
Then, run guardrails configure
.
guardrails hub install hub://guardrails/toxic_language --quiet;
# This will not download local models if you opted into remote inferencing during guardrails configure
# If you did not opt in, you can explicitly opt in for just this validator by passing the --no-install-local-models flag
From here, you can use the validator as you would normally.
from guardrails import Guard
from guardrails.hub import ToxicLanguage
guard = Guard().use(
ToxicLanguage()
)
The benefit of hosting a validator inference endpoint is the increase in speed and throughput compared to running locally. This implementation makes use cases such as streaming much more viable in production.
from IPython.display import display, clear_output
fragment_generator = guard(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Tell me about the Apple Iphone."},
],
max_tokens=1024,
temperature=0,
stream=True,
)
accumulated_output = ""
for op in fragment_generator:
clear_output()
accumulated_output += op.validated_output
display(accumulated_output)
Toggling remote inferencing
To enable/disable remote inferencing, you can run the CLI command guardrails configure
or modify your ~/.guardrailsrc
.
# To disable
guardrails configure --disable-remote-inferencing
# To enable
guardrails configure --enable-remote-inferencing
To disable remote inferencing from a specific validator, add a use_local
kwarg to the validator's initializer.
When running locally, you may need to reinstall the validator with the --install-local-models
flag.
from guardrails import Guard, install
try:
from guardrails.hub import ToxicLanguage
except ImportError:
install("hub://guardrails/toxic_language", install_local_models=True)
from guardrails.hub import ToxicLanguage
# uses validator locally.
guard.use(ToxicLanguage(use_local=True))
Hosting your own endpoint
Validators can point to any endpoint that implements the interface that Guardrails validators expect. This interface can be found in the _inference_remote
method of the validator.
After implementing this interface, you can host your own endpoint (for example, using gunicorn and Flask) and point your validator to it by setting the validation_endpoint
constructor argument.
guard = Guard().use(
ToxicLanguage(
use_local=False,
validation_endpoint="your_endpoint_ip_address",
)
)
Contact us to host validators in your own VPC with managed hardware.
Learn more
To learn more about hosting your own validators, check out the Host Remote Validator Models doc.
To learn more about writing your own validators, check out the Custom validators doc.