Expected outputs are a key element for evaluating LLM applications. They provide benchmarks to measure model accuracy, identify errors, and ensure consistent assessments.
pq.run()
or creating runs through the Playground UI, simply include your expected answers in a column called output
in your evaluation set.
ground_truth
parameter in the workflow creation methods.
To log your runs with Galileo, you’d start with the same typical flow of logging into Galileo:
add_expected_outputs
on your callback handler.