A/B Compare Prompts

Galileo allows you to compare multiple evaluation runs side-by-side. This lets you view how different configurations of your system (i.e. different params, prompt templates, retriever strategies, etc.) handled the same set of queries, enabling you to quickly evaluate, analyze, and annotate your experiments. Galileo allows you to do this for both single-step workflows, or multi-step / chain workflows. How do I get started? To enter the Compare Runs mode, select the runs you want to compare from your and click “Compare Runs” on the Action Bar.

For two runs to be comparable, the same evaluation dataset must be used to create them.

Once you’re in Compare Runs you can:

Compare how your different configurations responded to the same input.
Compare Metrics
Expand to see the full Trace of the multi-step workflow and identify which steps went wrong
Review and add Human Feedback
Toggle back and forth between inputs on your eval set.

Prompt Management-Storage Experiment with Multiple Prompts

Introduction

Evaluate

Observe

Protect

Galileo Guardrail Metrics

Fine Tune

Galileo NLP Studio

A/B Compare Prompts