Skip to main content
A training set is the dataset that fine-tunes your Luna base model during a training run. Training sets are typically much larger than test sets.

Sources

In the run creation flow, you choose one of three top-level paths:

Generate from test set

Luna Studio uses 20% of your test set as seeds and synthetically generates training data.

Add unlabelled training logs

Upload or import production logs. Luna Studio labels unlabelled logs before training.

Use existing training set

Reuse a generated, labelled, or uploaded training dataset from your workspace.
The Add training logs path lets you upload a .csv or .jsonl file, fetch a file from URL, or import a dataset from Galileo. Those same import methods are also available from the Datasets page Add training set button.

Required schema

Check the pre-requisites section for details

Labelled vs. unlabelled

When you add training logs, Luna Studio validates whether the dataset already has labels: If labels are missing, Luna Studio will automatically label the data using your LLM-as-judge prompt before training the model.

Generated training sets

The most common path for a first run is Generate from test set. The flow:
  1. Luna Studio uses 20% of your test set as seed examples.
  2. We first synthetically generate a sample training set for review.
  3. You review the sample rows and optionally regenerate with feedback.
  4. Luna Studio generates the full training set.
See Step 3 — Training set for the full reference. The resulting dataset shows up on the Datasets page with source Generated and a subtitle like “Generated from rag-eval-v2”.

File formats

For uploads and URL fetches:
  • CSV — standard comma-separated. Headers required.
  • JSONL — one JSON object per line, with input and (optionally) label keys.

Where to go next

Generate from test set

The most common path for first runs.

Add a dataset

Walk through the Upload / URL / Galileo flows.

Test sets

The other dataset type — used to evaluate the metric.

Validation

Schema and content checks Luna runs.