What makes a good test set
- Human-labelled. Don’t auto-generate test labels — they’re the tape measure for evaluating the run.
- Representative of production data. Sample inputs from the same distribution your application sees in production.
- Size. Use at least 300 rows with min of 100 rows per label. Ideally, a good test set should be 1000-3000 rows.
Required schema
Check the pre-requisites section for detailsFile formats
- CSV — standard comma-separated. Headers required.
- JSONL — one JSON object per line, with
inputandlabelkeys.
Add a test set
You can add a test set in three places:- The Datasets page → Add test set primary button.
- The Step 2 of the run creation flow → dropdown’s Add new test set action.
- (Indirectly) by importing from Galileo — see Galileo integration.
Where to go next
Add a dataset
Walk through the Upload / URL / Galileo flows.
Validation
What Luna Studio checks and what to do when validation fails.
Training sets
The other dataset type — used to fine-tune the base model.