A test set is the ground truth for a training run. After fine-tuning, Luna Studio scores the resulting metric against the test set and reports F1, AUC-ROC, and other diagnostics.Documentation Index
Fetch the complete documentation index at: https://docs.galileo.ai/llms.txt
Use this file to discover all available pages before exploring further.
What makes a good test set
- Hand-labelled. Don’t auto-generate test labels — they’re the tape measure for evaluating the run.
- Representative. Sample inputs from the same distribution your application sees in production. Skewed test sets lead to misleading scores.
- Small but not tiny. Use at least 300 hand-labelled rows when possible. Beyond a few hundred rows, you start paying inference cost without much added signal for most metrics.
- Held out. Don’t reuse test set rows in your training set. Luna Studio respects this when you generate a training set from a test set.
Required schema
Test sets need at least two columns:| Column | Required | Notes |
|---|---|---|
input | Yes | The text the metric scores. |
label | Yes | The ground-truth value. Type depends on the metric’s output type. |
id, timestamp) are kept but ignored during evaluation.
For label format by currently trainable output type:
| Output type | Acceptable label values |
|---|---|
| Boolean | true / false (or 1 / 0). |
| Categorical | One of the metric’s defined labels. |
File formats
- CSV — standard comma-separated. Headers required.
- JSONL — one JSON object per line, with
inputandlabelkeys.
Add a test set
You can add a test set in three places:- The Datasets page → Add test set primary button.
- The Step 2 of the run creation flow → dropdown’s Add new test set action.
- (Indirectly) by importing from Galileo — see Galileo integration.
Test/eval split
When a test set is selected for a training run, Luna Studio reserves 80% for evaluation and uses 20% as seed examples if you choose Generate from test set for the training source. The Selected dataset card in Step 2 shows the split, e.g.:320 rows · 3 columns · Uploaded · 80% → ~256 for eval
Used in metric
The Used in metric column on the Datasets table shows every metric whose runs reference this test set. If the column shows—, the test set is unused — safe to delete.
Renaming a test set
Open the test set’s details page and click the pencil icon next to its name in the breadcrumb. The Edit dataset name modal opens with the current name pre-filled.Where to go next
Add a dataset
Walk through the Upload / URL / Galileo flows.
Validation
What Luna checks and what to do when validation fails.
Training sets
The other dataset type — used to fine-tune the base model.