Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.galileo.ai/llms.txt

Use this file to discover all available pages before exploring further.

The Datasets page (sidebar → Datasets) is where you manage every dataset in your org. Datasets are org-scoped — once added, they’re available across every project for any training run.
Datasets page

Test sets vs. training sets

Luna Studio splits datasets into two flavors, accessible via tabs on the page:

Test sets

Small, hand-labelled datasets used to evaluate fine-tuned metrics. Required for every run.

Training sets

Larger datasets used to fine-tune the base model. Often generated from a test set.
The active tab determines what shows in the table and which Add button is shown.

Datasets table

Both tabs use the same column layout:
ColumnWhat it shows
Dataset nameThe dataset’s name.
RowsRow count, with thousands separators.
SourceOne of Galileo (Galileo glyph), Upload (upload icon), or URL (link icon).
Used in metricOutline-style badges for each metric that uses this dataset. Empty if unused.
Created atWhen the dataset was added.
Last updated atWhen the dataset was most recently changed.
Click any row to view the dataset.

Top-bar actions

  • Search — filter by dataset name.
  • Add test set / Add training set — primary button. The label tracks the active tab. Opens the Add dataset modal — see Add a dataset.

How datasets relate to runs

Each training run consumes exactly one test set and one training set. The same dataset can be reused across many runs. The Used in metric column on the datasets table shows you which metrics’ fine-tuning depends on a dataset — useful before deleting one.

Source types

SourceWhat it means
UploadYou uploaded a .csv or .jsonl file from your machine.
URLLuna fetched the dataset from an http/https/s3/gs URL.
GalileoLuna pulled the dataset from a project in your connected Galileo workspace.
For training sets specifically, an additional source applies:

Where to go next

Test sets

What test sets are, schema rules, and best practices.

Training sets

What training sets are and how to create or reuse them.

Add a dataset

Reference for the three dataset sources (Upload, URL, Galileo).

Dataset validation

What Luna checks when you add a dataset.