Remote Validation Inference

The problem

As a concept, guardrailing has a few areas that, when unoptimized, can introduce latency and be extremely resource-expensive. The main two areas are:

Guardrailing orchestration; and
ML models that validate a single guard

These are resource-heavy in slightly different ways. ML models can run with low latency on GPU-equipped machines. (Some ML models used for validation run in tens of seconds on CPUs, while they run in milliseconds on GPUs.) Meanwhile, guardrailing orchestration benefits from general memory and compute resources.

The Guardrails approach

The Guardrails library tackles this problem by providing an interface that allows users to separate the execution of orchestration from the execution of ML-based validation.

The layout of this solution is a simple upgrade to validator libraries themselves. Instead of always downloading and installing ML models, you can configure them to call a remote endpoint. This remote endpoint hosts the ML model behind an API that presents a unified interface for all validator models.

Guardrails hosts some of these for free as a preview feature. Users can host their own models by following the same interface.

note

Remote validation inferencing is only available in Guardrails versions 0.5.0 and above.

Using Guardrails inferencing endpoints

To use a guardrails endpoint, find a validator that has implemented support. Validators with a Guardrails-hosted endpoint are labeled as such on the Validator Hub. One example is Toxic Language.

note

To use remote inferencing endpoints, you need a Guardrails API key. You can get one by signing up at the Guardrails Hub.

Then, run guardrails configure.

guardrails hub install hub://guardrails/toxic_language --quiet;

# This will not download local models if you opted into remote inferencing during guardrails configure
# If you did not opt in, you can explicitly opt in for just this validator by passing the --no-install-local-models flag

From here, you can use the validator as you would normally.

from guardrails import Guard
from guardrails.hub import ToxicLanguage

guard = Guard().use(
    ToxicLanguage()
)

The benefit of hosting a validator inference endpoint is the increase in speed and throughput compared to running locally. This implementation makes use cases such as streaming much more viable in production.

from IPython.display import display, clear_output

fragment_generator = guard(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me about the Apple Iphone."},
    ],
    max_tokens=1024,
    temperature=0,
    stream=True,
)



accumulated_output = ""
for op in fragment_generator:
    clear_output()
    accumulated_output += op.validated_output
    display(accumulated_output)

Toggling remote inferencing

To enable/disable remote inferencing, you can run the CLI command guardrails configure or modify your ~/.guardrailsrc.

# To disable
guardrails configure --disable-remote-inferencing

# To enable
guardrails configure --enable-remote-inferencing

To disable remote inferencing from a specific validator, add a use_local kwarg to the validator's initializer.

note

When running locally, you may need to reinstall the validator with the --install-local-models flag.

from guardrails import Guard, install
try:
    from guardrails.hub import ToxicLanguage
except ImportError:
    install("hub://guardrails/toxic_language", install_local_models=True)
    from guardrails.hub import ToxicLanguage

# uses validator locally.
guard.use(ToxicLanguage(use_local=True))

Hosting your own endpoint

Validators can point to any endpoint that implements the interface that Guardrails validators expect. This interface can be found in the _inference_remote method of the validator.

After implementing this interface, you can host your own endpoint (for example, using gunicorn and Flask) and point your validator to it by setting the validation_endpoint constructor argument.

guard = Guard().use(
    ToxicLanguage(
        use_local=False,
        validation_endpoint="your_endpoint_ip_address",
    )
)

note

Learn more

To learn more about hosting your own validators, check out the Host Remote Validator Models doc.

To learn more about writing your own validators, check out the Custom validators doc.

Remote Validation Inference

The problem​

The Guardrails approach​

Using Guardrails inferencing endpoints​

Toggling remote inferencing​

Hosting your own endpoint​

Learn more

The problem

The Guardrails approach

Using Guardrails inferencing endpoints

Toggling remote inferencing

Hosting your own endpoint