How it works
As you identify mistakes in your metrics, you can provide ‘feedback’ to ‘auto-improve’ your metrics. Your feedback gets translated (by LLMs) into few-shot examples that are appended to the Metric’s prompt. Few-shot examples help your LLM-as-a-judge in a few ways:- Examples with your domain data teach it what to expect from your domain.
- Concrete examples on edge cases teach your LLM-as-a-judge how to deal with outlier scenarios.
CLHF-ed metrics are scoped to the project. I.e. you can have different teams customizing the same metric in different ways and not impact each other’s projects.
What to enter as feedback
When entering feedback, enter a critique of the explanation generated by the erroneous metric. Be as precise as possible in your critique, outlining the exact reason behind the desired metric value.How to use it
See this video on how to use Continuous Learning via Human Feedback to improve your metric accuracy:Which metrics is this supported on?
- Context Adherence
- Instruction Adherence
- Correctness
- Any LLM-as-a-judge generated via Galileo’s Autogen feature