Auto-generating an LLM-as-a-judge
Learn how to use Galileo’s Autogen feature to generate LLM-as-a-judge metrics.
Creating an LLM-as-a-judge metric is really easy with Galileo’s Autogen feature. You can simply enter a description of what you want to measure or detect, and Galileo auto-generates a metric for you.
How it works
When you enter a description of your metric (e.g. “detect any toxic language in the inputs”), your description is converted into a prompt and few-shot examples for your metric. This prompt and few-shot examples are used to power an LLM-as-a-judge that uses chain-of-thought and majority voting (see Chainpoll paper) to calculate a metric.
You can customize the model that gets used or the number of judges used to calculate your metric.
How to use it
Editing and Iterating on your auto-generated LLM-as-a-judge
You can always go back and edit your prompt or examples. Additionally, you can use Continuous Learning via Human Feedback (CLHF) to improve and adapt your metric.