Continuous integration and deployment - AWS

Guardrails allows you to deploy a dedicated server to run guard executions while continuing to use the Guardrails SDK as you do today. In this guide we show an example of deploying a containerized version of Guardrails API into AWS leveraging AWS ECS.

Read the quick start guide on using Guardrails on the server here
Find generalized information on deploying Guardrails here

Step 1: Containerizing Guardrails API

Updating Guardrails config + guard/validator definitions

Guardrails supports creating Guards from templates. They can come from the Guardrails Hub.

# Use Guardrails pre-defined template
mkdir guardrails

cd guardrails

guardrails create --template hub:template://guardrails/chatbot

Running the command above will create 2 local artifacts: a config.py and chatbot.json which is referenced in the config. The chatbot.json contains definitions for guards, validators and validator configurations. Each entry in guards is equivalent to a JSON serialization of guard.to_dict(). A simple example is below:

{
  "name": "chatbot",
  "description": "guard to validate chatbot output",
  "template_version": "0.0.1",
  "namespace": "guardrails",
  "guards": [
    {
      "id": "chatbot",
      "name": "chatbot",
      "validators": [
        {
          "id": "guardrails/detect_pii",
          "on": "$",
          "onFail": "exception",
          "kwargs": {
            "pii_entities": ["PERSON"]
          }
        }
      ]
    }
  ]
}

A template can also be a local json file with the format above. A config for it can be generated via the command below:

guardrails create --template chatbot.json

The validator arguments and entries can be updated manually or programmatically. For example we could update kwargs to only identify and fix location:

"kwargs": {
  "pii_entities":["PERSON", "LOCATION"]
}

It is recommended to keep a requirements.txt or equivalent project dependencies file and template json in source control to allow CI/CD, targeted deployments and rollback. config.py is automatically generated. In some cases config.py might need to be customized. A customized config.py should be kept in source control and COPY’ed in the docker build step (skipping a guardrails create step that would overwrite it).

Container build

Building a guard can be achieved with docker. An example build file is below. It is recommended to keep the Dockerfile also in source control. Create a Dockerfile in a new working directory guardrails (or alternative).

FROM python:3.12-slim

ARG GUARDRAILS_TOKEN
ARG GUARDRAILS_SDK_VERSION="guardrails-ai[api]>=0.5.0,<6"

# Set environment variables to avoid writing .pyc files and to unbuffer Python output
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
# These are purposefully low for initialization and development
# set to WARN or higher in production
ENV LOGLEVEL="DEBUG"
ENV GUARDRAILS_LOG_LEVEL="DEBUG"
ENV APP_ENVIRONMENT="production"

WORKDIR /app

# Install Git and necessary dependencies
RUN apt-get update && \
    apt-get install -y git curl gcc jq && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

RUN pip install --upgrade pip

# Install guardrails, the guardrails API, and gunicorn
RUN pip install $GUARDRAILS_SDK_VERSION "gunicorn"

RUN guardrails configure --enable-metrics --enable-remote-inferencing --token $GUARDRAILS_TOKEN

# Bring in base template
COPY guardrails/chatbot.json /app/chatbot.json

# Uncomment this and comment out the RUN guardrails create if using a customized config.py
# COPY guardrails/config.py

# Install Hub Deps and create config.py
RUN guardrails create --template /app/chatbot.json

# Expose port 8000 for the application
EXPOSE 8000

# Command to start the Gunicorn server with specified settings
CMD ["gunicorn", "--bind", "0.0.0.0:8000", "--timeout=5", "--threads=3", "guardrails_api.app:create_app()"]

The container above can be built and run locally with the following commands:

docker build -t gr-backend-images:latest --no-cache --progress=plain --build-arg GUARDRAILS_TOKEN=[YOUR GUARDRAILS TOKEN] .
# If running into issues on M-based Apple Macs try forcing the build platform with --platform linux/amd64 
docker run -d -p 8000:8000 -e OPENAI_API_KEY=[YOUR OPENAI KEY] gr-backend-images:latest

Step 2: Verification

Verification of guards should be done as programmatically as possible. Here is an example pytest that can test server based guards in a variety of ways. It is configured to run against the container above and do some basic tests around validation and integration with an LLM.

import openai
import os
import pytest
from guardrails import Guard, settings

# OpenAI compatible Guardrails API Guard
openai.base_url = "http://127.0.0.1:8000/guards/chatbot/openai/v1/"

# The client requires this but we will use the key the server is already seeded with
# This does not need to be set as we will be proxying all our llm interaction through
# Guardrails server
openai.api_key = os.getenv("OPENAI_API_KEY") or 'some key'

@pytest.mark.parametrize(
    "mock_llm_output, validation_output, validation_passed, error", [
        ("Paris is wonderful in the spring", "Paris is wonderful in the spring", False, True),
        ("Here is some info. You can find the answers there.","Here is some info. You can find the answers there.", True, False)
    ]
)
def test_guard_validation(mock_llm_output, validation_output, validation_passed, error):
    settings.use_server = True
    guard = Guard(name="chatbot")
    if error:
        with pytest.raises(Exception) as e:
            validation_outcome = guard.validate(mock_llm_output)
    else:
        validation_outcome = guard.validate(mock_llm_output)
        assert validation_outcome.validation_passed == validation_passed
        assert validation_outcome.validated_output == validation_output

@pytest.mark.parametrize(
    "message_content, output, validation_passed, error",[
        ("Tell me about Paris in a 10 word or less sentence", "Romantic, historic",False, True),
        ("Write a sentence using the word banana.", "banana", True, False)
    ]
)
def test_server_guard_llm_integration(message_content, output, validation_passed, error):
    settings.use_server = True
    guard = Guard(name="chatbot")
    messages =[
        {
            "role":"user",
            "content": message_content
        }
    ]
    if error:
        with pytest.raises(Exception):
            validation_outcome = guard(
                model="gpt-4o-mini",
                messages=messages,
                temperature=0.0,
            )
    else:
        validation_outcome = guard(
            model="gpt-4o-mini",
            messages=messages,
            temperature=0.0,
        )
        assert(output) in validation_outcome.validated_output
        assert(validation_outcome.validation_passed) is validation_passed

Step 3: Deploying infrastructure

By leveraging AWS ECS we can scale to handle increasing workloads by scaling the number of containers. Furthermore we can leverage a streamlined deployment process using ECS with rolling updates. We can now deploy the infrastructure needed for AWS ECS which includes:

Networking Resources (VPC, Load Balancer, Security Groups, Subnets, etc)
IAM Roles & Policies (ECS Task & Execution Role)
ECS Cluster (ECS Service, Task, Task Definition)

We start by initializing terraform with:

terraform init

One can then copy the provided Terraform code or use their own by placing into our working directory and running:

terraform apply -var="aws_region=us-east-1" -var="backend_memory=2048" -var="backend_cpu=1024" -var="desired_count=0"

Each can be configured based on your requirements. desired_count corresponds to the number of containers that should always be running. Alternatively one can configure a minimum & maximum count with some autoscaling policy. It is initially set to 0 since we have yet to upload the container to the AWS container registry (ECR).

Once the deployment has succeeded you should see some output values (which will be required if you wish to set up CI).

Step 4: Deploying Guardrails API

Manual

Firstly, create or use your existing guardrails token and export it to your current shell:

export GUARDRAILS_TOKEN="..."

# Optionally use the command below to use your existing token
export GUARDRAILS_TOKEN=$(cat ~/.guardrailsrc| awk -F 'token=' '{print $2}' | awk '{print $1}' | tr -d '\n')

Run the following to build your container and push up to ECR:

# Build Container
docker build --platform linux/amd64 --build-arg GUARDRAILS_TOKEN=$GUARDRAILS_TOKEN -t gr-backend-images:latest .

# Push to ECR
aws ecr get-login-password --region ${YOUR_AWS_REGION} | docker login --username AWS --password-stdin ${YOUR_AWS_ACCOUNT_ID}.dkr.ecr.${YOUR_AWS_REGION}.amazonaws.com
docker tag guardrails-api:latest ${YOUR_AWS_ACCOUNT_ID}.dkr.ecr.${YOUR_AWS_REGION}.amazonaws.com/gr-backend-images:latest
docker push ${YOUR_AWS_ACCOUNT_ID}.dkr.ecr.${YOUR_AWS_REGION}.amazonaws.com/gr-backend-images:latest

Github Actions

Deployment can vary depending on hosting infrastructure and environment. For AWS we recommend using a service like ECS and triggering no downtime rolling deployments via something like Github actions. See the full Github Actions workflow example in the Guardrails repository.

Deployment/Update frequency

Generally guardrails core lib and validators are updated on a very regular basis (weekly) with bug fixes, security fixes and non-breaking feature updates. Every release is accompanied by release notes here. Large releases with breaking changes happen at a slower cadence and will be accompanied with migration guides. It is recommended to update on a semi-regular basis utilizing a CI/CD flow like the one outlined in this document. With the recommended steps below:

Update the guardrails version tag
Follow any migration guides that need to be applied
Run build locally and verify tests pass
Commit updates to source control
Source control changes are approved and merged to main
Github action triggers and updates are deployed

Remote inference

Validators that use LLMs and other models can often gain a large performance boost from running their inferences in batches on dedicated hardware with dedicated accelerators. It is also often advantageous to scale this infrastructure independently of the core guards and validators. Many guardrails validators support remote inference for development purposes for free and it can be toggled locally via guardrails configure and answering Y to enabling remote inference. See more general information about remote inference here.

Using with SDK

You should be able to get the URL for your Guardrails API using:

export GUARDRAILS_BASE_URL=$(terraform output -raw backend_service_url)
echo "http://$GUARDRAILS_BASE_URL"

By setting the above environment variable GUARDRAILS_BASE_URL the SDK will be able to use this as a backend for running validations.

Quick start repository template

We’ve conveniently packaged all the artifacts from this document in a github repository that can be used as a template for your own verification and deployment here.

How-to guides

Sample apps

Continuous integration and deployment - AWS

Step 1: Containerizing Guardrails API

Updating Guardrails config + guard/validator definitions

Container build

Step 2: Verification

Step 3: Deploying infrastructure

Step 4: Deploying Guardrails API

Manual

Github Actions

Deployment/Update frequency

Remote inference

Using with SDK

Quick start repository template

How-to guides

Sample apps

​Step 1: Containerizing Guardrails API

​Updating Guardrails config + guard/validator definitions

​Container build

​Step 2: Verification

​Step 3: Deploying infrastructure

​Step 4: Deploying Guardrails API

​Manual

​Github Actions

​Deployment/Update frequency

​Remote inference

​Using with SDK

​Quick start repository template

Step 1: Containerizing Guardrails API

Updating Guardrails config + guard/validator definitions

Container build

Step 2: Verification

Step 3: Deploying infrastructure

Step 4: Deploying Guardrails API

Manual

Github Actions

Deployment/Update frequency

Remote inference

Using with SDK

Quick start repository template