Customizing your LLM-powered metrics via CLHF

How it works
What to enter as feedback
How to use it
Which metrics is this supported on?

As you start using Galileo Preset LLM-powered metrics (e.g. Context Adherence or Instruction Adherence), or start creating your own LLM-powered metrics via Autogen, you might not always agree with the results. False positives or False Negatives in metric values are often due to domain edge cases that aren’t handled in the metric’s prompt. Galileo helps you address this problem and adapt and continuously improve metrics via Continuous Learning via Human Feedback.

How it works

As you identify mistakes in your metrics, you can provide ‘feedback’ to ‘auto-improve’ your metrics. Your feedback gets translated (by LLMs) into few-shot examples that are appended to the Metric’s prompt. Few-shot examples help your LLM-as-a-judge in a few ways:

Examples with your domain data teach it what to expect from your domain.
Concrete examples on edge cases teach your LLM-as-a-judge how to deal with outlier scenarios.

This process has shown to increase accuracy of metrics by 20-30%.

CLHF-ed metrics are scoped to the project. I.e. you can have different teams customizing the same metric in different ways and not impact each other’s projects.

What to enter as feedback

When entering feedback, enter a critique of the explanation generated by the erroneous metric. Be as precise as possible in your critique, outlining the exact reason behind the desired metric value.

How to use it

See this video on how to use Continuous Learning via Human Feedback to improve your metric accuracy:

Which metrics is this supported on?

Context Adherence
Instruction Adherence
Correctness
Any LLM-as-a-judge generated via Galileo’s Autogen feature

Auto-generating an LLM-as-a-judge Integrations | Galileo Evaluate

⌘I

Introduction

Evaluate

Observe

Protect

Galileo Guardrail Metrics

Fine Tune

Galileo NLP Studio

Customizing your LLM-powered metrics via CLHF

How it works

What to enter as feedback

How to use it

Which metrics is this supported on?

Introduction

Evaluate

Observe

Protect

Galileo Guardrail Metrics

Fine Tune

Galileo NLP Studio

​How it works

​What to enter as feedback

​How to use it

​Which metrics is this supported on?

How it works

What to enter as feedback

How to use it

Which metrics is this supported on?