Auto-generating an LLM-as-a-judge
Learn how to use Galileo’s Autogen feature to generate LLM-as-a-judge metrics.
Creating an LLM-as-a-judge metric is really easy with Galileo’s Autogen feature. You can simply enter a description of what you want to measure or detect, and Galileo auto-generates a metric for you.
How it works
When you enter a description of your metric (e.g. “detect any toxic language in the inputs”), your description is converted into a prompt and few-shot examples for your metric. This prompt and few-shot examples are used to power an LLM-as-a-judge that uses chain-of-thought and majority voting (see Chainpoll paper) to calculate a metric.
You can customize the model that gets used or the number of judges used to calculate your metric.
How to use it
Editing and Iterating on your auto-generated LLM-as-a-judge
You can always go back and edit your prompt or examples. Additionally, you can use Continuous Learning via Human Feedback (CLHF) to improve and adapt your metric.
Was this page helpful?