BLEU and ROUGE
Understand BLEU & ROUGE-1 scores
Definition: Metrics used heavily in sequence-to-sequence tasks measuring n-gram overlap between a generated response and a target output. Higher BLEU and ROUGE-1 scores equates to better overlap between the generated and target output.
Calculation: A measure of n-gram overlap. A more lengthy explanation of BLEU provided here. A more lengthy explanation of ROUGE-1 provided here. These metrics require a column in your dataset.
Usefulness: Evaluate the accuracy of model outputs in comparison to target outputs, enabling a metric to guide improvement and examination of areas where a model has trouble adhering to expected output.
Note: These metrics require a Ground Truth to be set. Check out this page to learn how to add a Ground Truth to your runs.
Was this page helpful?