> ## Documentation Index
> Fetch the complete documentation index at: https://docs.galileo.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Training sets

> The dataset used to fine-tune the base model during a run.

A **training set** is the dataset that fine-tunes your [base model](/luna-studio/ui/core-concepts#base-models) during a [training run](/luna-studio/ui/runs/lifecycle). Training sets are typically much larger than test sets; generated training sets contain 2,000 labelled examples.

## Sources

In the run creation flow, you choose one of three top-level paths:

<CardGroup cols={3}>
  <Card title="Generate from test set" icon="wand-magic-sparkles">
    Luna uses 20% of your test set as seeds and synthesizes 2,000 labelled rows via an LLM-as-judge prompt.
  </Card>

  <Card title="Add training logs" icon="upload">
    Upload or import production logs. Luna Studio labels unlabelled logs before training.
  </Card>

  <Card title="Use existing training set" icon="database">
    Reuse a generated, labelled, or uploaded training dataset from your workspace.
  </Card>
</CardGroup>

The **Add training logs** path lets you upload a `.csv` or `.jsonl` file, fetch a file from URL, or import a dataset from Galileo. Those same import methods are also available from the [Datasets page](/luna-studio/ui/datasets/overview) **Add training set** button.

## Schema

Training sets need at least one column:

| Column  | Required  | Notes                                                                             |
| ------- | --------- | --------------------------------------------------------------------------------- |
| `input` | Yes       | The text the metric will be trained on.                                           |
| `label` | Sometimes | Required for labelled training. Unlabelled logs must be labelled before training. |

### Labelled vs. unlabelled

* **Labelled** — every row has a `label` column matching the metric's output type. Required if you want supervised fine-tuning.
* **Unlabelled** — rows have an `input` only. Luna Studio labels the logs with your LLM-as-judge prompt first, saves a labelled training dataset, and then uses that labelled dataset for training.

When you add training logs, Luna Studio validates whether the dataset already has labels:

* A green check when the label column is present.
* A label-only flow when the label column is missing.

## Generated training sets

The most common path for a first run is **Generate from test set**. The flow:

1. Luna uses 20% of your test set as seed examples.
2. The configured model you pick synthesizes 50 sample rows following the metric prompt.
3. You review the sample rows and optionally regenerate with feedback.
4. Luna generates the full 2,000-example training set.

See [Step 3 — Training set](/luna-studio/ui/runs/new-run/step-3-training-set#generate-from-test-set) for the full reference.

The resulting dataset shows up on the [Datasets page](/luna-studio/ui/datasets/overview) with source **Generated** and a subtitle like "Generated from rag-eval-v2". Each row carries an **Origin** marker so you can trace it back: rows synthesized by the generator render as "Generated", while rows seeded from a test set render as a chip with that test set's name.

## File formats

For uploads and URL fetches:

* **CSV** — standard comma-separated. Headers required.
* **JSONL** — one JSON object per line, with `input` and (optionally) `label` keys.

## Used in metric

The **Used in metric** column shows every metric whose runs reference this training set. If empty, the training set isn't being used — safe to delete.

## Where to go next

<CardGroup cols={2}>
  <Card title="Generate from test set" icon="wand-magic-sparkles" href="/luna-studio/ui/runs/new-run/step-3-training-set">
    The most common path for first runs.
  </Card>

  <Card title="Add a dataset" icon="upload" href="/luna-studio/ui/datasets/add-a-dataset">
    Walk through the Upload / URL / Galileo flows.
  </Card>

  <Card title="Test sets" icon="database" href="/luna-studio/ui/datasets/test-sets">
    The other dataset type — used to evaluate the metric.
  </Card>

  <Card title="Validation" icon="circle-check" href="/luna-studio/ui/datasets/validation">
    Schema and content checks Luna runs.
  </Card>
</CardGroup>
