Auto-generating an LLM-as-a-judge

On this page

How it works
How to use it
Editing and Iterating on your auto-generated LLM-as-a-judge

Creating an LLM-as-a-judge metric is really easy with Galileo’s Autogen feature. You can simply enter a description of what you want to measure or detect, and Galileo auto-generates a metric for you.

How it works

When you enter a description of your metric (e.g. “detect any toxic language in the inputs”), your description is converted into a prompt and few-shot examples for your metric. This prompt and few-shot examples are used to power an LLM-as-a-judge that uses chain-of-thought and majority voting (see Chainpoll paper) to calculate a metric. You can customize the model that gets used or the number of judges used to calculate your metric.

Currently, auto-generated metrics are restricted to binary (yes/no) measurements. Multiple choice or numerical ratings are coming soon.

How to use it

Editing and Iterating on your auto-generated LLM-as-a-judge

You can always go back and edit your prompt or examples. Additionally, you can use Continuous Learning via Human Feedback (CLHF) to improve and adapt your metric.

Using Datasets Customizing your LLM-powered metrics via CLHF

Introduction

Evaluate

Observe

Protect

Galileo Guardrail Metrics

Fine Tune

Galileo NLP Studio

Auto-generating an LLM-as-a-judge

How it works

How to use it

Editing and Iterating on your auto-generated LLM-as-a-judge

Introduction

Evaluate

Observe

Protect

Galileo Guardrail Metrics

Fine Tune

Galileo NLP Studio

​How it works

​How to use it

​Editing and Iterating on your auto-generated LLM-as-a-judge

How it works

How to use it

Editing and Iterating on your auto-generated LLM-as-a-judge