Learn how to use Galileo’s Autogen feature to generate LLM-as-a-judge metrics.
Creating an LLM-as-a-judge metric is really easy with Galileo’s Autogen feature. You can simply enter
a description of what you want to measure or detect, and Galileo auto-generates a metric for you.
When you enter a description of your metric (e.g. “detect any toxic language in the inputs”), your description
is converted into a prompt and few-shot examples for your metric. This prompt and few-shot examples are used
to power an LLM-as-a-judge that uses chain-of-thought and majority voting (see Chainpoll paper) to calculate a metric.You can customize the model that gets used or the number of judges used to calculate your metric.
Currently, auto-generated metrics are restricted to binary (yes/no) measurements. Multiple choice or numerical ratings are coming soon.