input and an output.
Dataset schema
Typical columns:input: the source prompt, instruction, or user messageoutput: the model response or candidate answerlabel: the ground-truth class for the metric
Config shape
Use a span-style config with:data_generation.metric.input_format: "tuple"data_generation.source_data.dataset.columns.features: ["input", "output"]training.metric.type: "boolean"for binary metrics