A training set is the dataset that fine-tunes your base model during a training run. Training sets are typically much larger than test sets; generated training sets contain 2,000 labelled examples.Documentation Index
Fetch the complete documentation index at: https://docs.galileo.ai/llms.txt
Use this file to discover all available pages before exploring further.
Sources
In the run creation flow, you choose one of three top-level paths:Generate from test set
Luna uses 20% of your test set as seeds and synthesizes 2,000 labelled rows via an LLM-as-judge prompt.
Add training logs
Upload or import production logs. Luna Studio labels unlabelled logs before training.
Use existing training set
Reuse a generated, labelled, or uploaded training dataset from your workspace.
.csv or .jsonl file, fetch a file from URL, or import a dataset from Galileo. Those same import methods are also available from the Datasets page Add training set button.
Schema
Training sets need at least one column:| Column | Required | Notes |
|---|---|---|
input | Yes | The text the metric will be trained on. |
label | Sometimes | Required for labelled training. Unlabelled logs must be labelled before training. |
Labelled vs. unlabelled
- Labelled — every row has a
labelcolumn matching the metric’s output type. Required if you want supervised fine-tuning. - Unlabelled — rows have an
inputonly. Luna Studio labels the logs with your LLM-as-judge prompt first, saves a labelled training dataset, and then uses that labelled dataset for training.
- A green check when the label column is present.
- A label-only flow when the label column is missing.
Generated training sets
The most common path for a first run is Generate from test set. The flow:- Luna uses 20% of your test set as seed examples.
- The configured model you pick synthesizes 50 sample rows following the metric prompt.
- You review the sample rows and optionally regenerate with feedback.
- Luna generates the full 2,000-example training set.
File formats
For uploads and URL fetches:- CSV — standard comma-separated. Headers required.
- JSONL — one JSON object per line, with
inputand (optionally)labelkeys.
Used in metric
The Used in metric column shows every metric whose runs reference this training set. If empty, the training set isn’t being used — safe to delete.Where to go next
Generate from test set
The most common path for first runs.
Add a dataset
Walk through the Upload / URL / Galileo flows.
Test sets
The other dataset type — used to evaluate the metric.
Validation
Schema and content checks Luna runs.