Sources
In the run creation flow, you choose one of three top-level paths:Generate from test set
Luna Studio uses 20% of your test set as seeds and synthetically generates training data.
Add unlabelled training logs
Upload or import production logs. Luna Studio labels unlabelled logs before training.
Use existing training set
Reuse a generated, labelled, or uploaded training dataset from your workspace.
.csv or .jsonl file, fetch a file from URL, or import a dataset from Galileo. Those same import methods are also available from the Datasets page Add training set button.
Required schema
Check the pre-requisites section for detailsLabelled vs. unlabelled
When you add training logs, Luna Studio validates whether the dataset already has labels: If labels are missing, Luna Studio will automatically label the data using your LLM-as-judge prompt before training the model.Generated training sets
The most common path for a first run is Generate from test set. The flow:- Luna Studio uses 20% of your test set as seed examples.
- We first synthetically generate a sample training set for review.
- You review the sample rows and optionally regenerate with feedback.
- Luna Studio generates the full training set.
File formats
For uploads and URL fetches:- CSV — standard comma-separated. Headers required.
- JSONL — one JSON object per line, with
inputand (optionally)labelkeys.
Where to go next
Generate from test set
The most common path for first runs.
Add a dataset
Walk through the Upload / URL / Galileo flows.
Test sets
The other dataset type — used to evaluate the metric.
Validation
Schema and content checks Luna runs.