Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.galileo.ai/llms.txt

Use this file to discover all available pages before exploring further.

A test set is the ground truth for a training run. After fine-tuning, Luna Studio scores the resulting metric against the test set and reports F1, AUC-ROC, and other diagnostics.

What makes a good test set

  • Hand-labelled. Don’t auto-generate test labels — they’re the tape measure for evaluating the run.
  • Representative. Sample inputs from the same distribution your application sees in production. Skewed test sets lead to misleading scores.
  • Small but not tiny. Use at least 300 hand-labelled rows when possible. Beyond a few hundred rows, you start paying inference cost without much added signal for most metrics.
  • Held out. Don’t reuse test set rows in your training set. Luna Studio respects this when you generate a training set from a test set.

Required schema

Test sets need at least two columns:
ColumnRequiredNotes
inputYesThe text the metric scores.
labelYesThe ground-truth value. Type depends on the metric’s output type.
Other columns (e.g. id, timestamp) are kept but ignored during evaluation. For label format by currently trainable output type:
Output typeAcceptable label values
Booleantrue / false (or 1 / 0).
CategoricalOne of the metric’s defined labels.
Floating-point, percentage, multilabel, and numeric labels are not trainable in Luna Studio yet.

File formats

  • CSV — standard comma-separated. Headers required.
  • JSONL — one JSON object per line, with input and label keys.
Both formats accept up to a few hundred MB. Larger uploads work but take longer to validate.

Add a test set

You can add a test set in three places: All three paths open the same Add test set modal — see Add a dataset for the modal reference.

Test/eval split

When a test set is selected for a training run, Luna Studio reserves 80% for evaluation and uses 20% as seed examples if you choose Generate from test set for the training source. The Selected dataset card in Step 2 shows the split, e.g.:
320 rows · 3 columns · Uploaded · 80% → ~256 for eval

Used in metric

The Used in metric column on the Datasets table shows every metric whose runs reference this test set. If the column shows , the test set is unused — safe to delete.

Renaming a test set

Open the test set’s details page and click the pencil icon next to its name in the breadcrumb. The Edit dataset name modal opens with the current name pre-filled.

Where to go next

Add a dataset

Walk through the Upload / URL / Galileo flows.

Validation

What Luna checks and what to do when validation fails.

Training sets

The other dataset type — used to fine-tune the base model.