Skip to main content
A test set is the ground truth for a training run. After fine-tuning, Luna Studio scores the resulting metric against the test set and reports F1, AUC-ROC, and other performance KPIs.

What makes a good test set

  • Human-labelled. Don’t auto-generate test labels — they’re the tape measure for evaluating the run.
  • Representative of production data. Sample inputs from the same distribution your application sees in production.
  • Size. Use at least 300 rows with min of 100 rows per label. Ideally, a good test set should be 1000-3000 rows.

Required schema

Check the pre-requisites section for details

File formats

  • CSV — standard comma-separated. Headers required.
  • JSONL — one JSON object per line, with input and label keys.
Both formats accept up to a few hundred MB. Larger uploads work but take longer to validate.

Add a test set

You can add a test set in three places: All three paths open the same Add test set modal — see Add a dataset for the modal reference.

Where to go next

Add a dataset

Walk through the Upload / URL / Galileo flows.

Validation

What Luna Studio checks and what to do when validation fails.

Training sets

The other dataset type — used to fine-tune the base model.