BLEU and ROUGE

Definition: Metrics used heavily in sequence-to-sequence tasks measuring n-gram overlap between a generated response and a target output. Higher BLEU and ROUGE-1 scores equates to better overlap between the generated and target output. Calculation: A measure of n-gram overlap. A more lengthy explanation of BLEU provided here. A more lengthy explanation of ROUGE-1 provided here. These metrics require a column in your dataset. Usefulness: Evaluate the accuracy of model outputs in comparison to target outputs, enabling a metric to guide improvement and examination of areas where a model has trouble adhering to expected output.

Note: These metrics require a Ground Truth to be set. Check out this page to learn how to add a Ground Truth to your runs.

Uncertainty Context vs. Instruction Adherence | Guardrail Metrics FAQ

Introduction

Evaluate

Observe

Protect

Galileo Guardrail Metrics

Fine Tune

Galileo NLP Studio