Every dataset added to Luna Studio — uploaded, fetched, generated, or imported — goes through validation. Validation determines whether the dataset can be used in a run.Documentation Index
Fetch the complete documentation index at: https://docs.galileo.ai/llms.txt
Use this file to discover all available pages before exploring further.
States
Validation has four possible end states:| State | Meaning |
|---|---|
| Validating… | In progress. Usually completes within a few seconds. |
| Validated | Ready to use. |
| Validated with warnings | Usable, but Luna noticed something worth checking (e.g. mixed casing, partial UTF-8). |
Validation error: {message} | Unusable until fixed. The dataset is saved in your org but can’t be selected for a run. |
What Luna checks
Schema checks
- The file parses as CSV or JSONL.
- Required columns are present:
- Test sets — metric-specific feature columns and
label. - Training sets — metric-specific feature columns and
labelwhen the dataset is already labelled.
- Test sets — metric-specific feature columns and
- Column types match the metric’s output type (e.g. labels are parseable as Boolean for a Boolean metric).
Content checks
- File encoding (UTF-8 expected).
- Row count > 0.
- Empty rows are flagged as warnings.
- Inputs that exceed the model’s max token limit are flagged.
File checks (uploads and URLs)
- File size within the upload limit.
- For URLs: the URL is reachable; the response content type is appropriate.
Common errors
| Error message | Cause | Fix |
|---|---|---|
Missing column "label" in uploaded CSV. | Test set or labelled training set without a label column. | Add the column to your file and re-upload. For unlabelled training logs, use the label-only flow from Step 3. |
Missing column "input" in uploaded CSV. | Required input column missing. | Rename your input column to input, or include one. |
Could not fetch URL. | The URL was unreachable, returned a non-2xx status, or required auth. | Check the URL is correct; for cloud URLs, confirm the relevant integration has access. |
Unsupported file type. | The file isn’t .csv or .jsonl. | Convert the file to one of the supported formats. |
Empty dataset. | The file parsed but had zero data rows. | Confirm the file isn’t header-only. |
Label values don't match metric output type. | Labels can’t be parsed as the metric’s expected type. | Re-check labels (e.g. for Boolean: only true/false or 1/0). |
Common warnings
| Warning | Cause | Action |
|---|---|---|
Mixed casing in labels. | Labels like True and true are mixed. | Normalize case in your file. Luna will still validate but may produce a noisier metric. |
Some rows exceed the model's max token limit. | Long inputs that will be truncated. | Either trim the rows or accept the truncation. |
Detected non-UTF-8 characters. | The file isn’t pure UTF-8. | Re-export the file as UTF-8. |
Unlabelled training logs
If uploaded or imported training logs are missing labels, Luna Studio does not train on them directly. It opens the label-only generation flow, uses the selected metric prompt to create labels, and saves a labelled training dataset. Generated training sets are always labelled.When validation fails mid-run
If a dataset that was previously Validated later fails (e.g. the originating Galileo dataset changed), the run that consumed it can fail with a validation error. See Run failed.Re-validating
Luna validates a dataset once at add-time and once per run launch. There’s no manual “re-validate” button — to re-check a dataset, re-add it (or fetch the URL again).Where to go next
Add a dataset
Walk through the three sources.
Troubleshooting
Run-time failures and how to recover.