Skip to main content
The Datasets page (sidebar → Datasets) is where you manage every dataset in your org. Datasets are org-scoped — once added, they’re available across every project for any training run.
Datasets page

Test sets vs. training sets

Luna Studio splits datasets into two flavors, accessible via tabs on the page:

Test sets

Small, human-labelled datasets used to evaluate fine-tuned metrics. Required for every run.

Training sets

Larger datasets used to fine-tune the base model. Often generated from a test set.

How datasets relate to runs

Each training run consumes exactly one test set and one training set. The same dataset can be reused across many runs. The Used in metric column on the datasets table shows you which metrics’ fine-tuning depends on a dataset — useful before deleting one.

Source types

SourceWhat it means
UploadYou uploaded a .csv or .jsonl file from your machine.
URLLuna Studio fetched the dataset from an http/https/s3/gs URL.
GalileoLuna Studio pulled the dataset from a project in your connected Galileo workspace.
For training sets specifically, an additional source applies:

Where to go next

Test sets

What test sets are, schema rules, and best practices.

Training sets

What training sets are and how to create or reuse them.

Add a dataset

Reference for the three dataset sources (Upload, URL, Galileo).

Dataset validation

What Luna Studio checks when you add a dataset.