Overview

Metric type: binary or multi-class (<a class="link" href="/luna-studio/sdk/how-to-train-your-luna-metric/data-generation/config/metric-output-types">Read More )
Input format: single , tuple , or rag (<a class="link" href="/luna-studio/sdk/how-to-train-your-luna-metric/data-generation/config/metric-input-types">Read More )

Data generation produces / labels the training dataset for your Luna metric. You run it using:

from galileo_luna_ft.data_generation import run_data_generation

training_dataset_path = run_data_generation(config_path="./config.yaml")

Inputs Required

Reads your data_generation config section
Loads the source dataset (CSV or Hugging Face)
Consumes a small portion of your test set to create synthetic training data (defaults to 20%, can be configured with source_data.sampling.enhancement_fraction)
Generates synthetic examples using your configured LLM
Writes a dataset artifact locally and/or pushes to Hugging Face (depending on output.push_to_hub)

The output dataset here has 2 splits -

Train: The training data for the Luna metric
Test: Your original test set minus the consumed portion (defaults to 80% of your original test set)

The output dataset is saved as a Huggingface formatted dataset. Next: see the detailed Config file.

⌘I