Our approach to telemetry
Telemetry is a crucial part of the LLM toolcahin in production. LLMs are nondeterministic and work with relatively high latency compared to more traditional APIs. This makes it important to monitor production systems that use LLMs, as well as the LLMs themselves. Basic LLM telemetry like token count and latency is important. It’s as important to monitor the performance of the Guards that protect your LLMs. By integrating telemetry, we can find how effective our guards are and how much latency they add to the system.Metrics you can capture using OTEL
This package is instrumented using the OpenTelemetry Python SDK. By viewing the captured traces and derived metrics, we’re able to get useful insights into how our Guards, and our LLM apps in general perform. Among other things, we’re able to find:- Latency of Guards
- Latency of LLMs
- Success rates of Guards
- The rate at which validators Pass and Fail, within a Guard and across Guards
- Deep dives into singular guard and validator calls
Configure Guardrails to talk to an OTLP collector
To talk to an OTLP collector, you need only add a few environment variables and create a TracerProvider. To streamline development, Guardrails provides a default OTLP tracer and provider. First, set these environment variablesConfigure OTEL for a self-hosted OpenTelemetry Collector
For advanced use cases (like if you have a metrics provider in a VPC), you can use a self-hosted OpenTelemetry Collector to receive traces and metrics from your Guard. Standard open telemetry environment variables are used to configure the collector. Use thedefault_otel_collector_tracer when configuring your guard.