Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.galileo.ai/llms.txt

Use this file to discover all available pages before exploring further.

Every dataset added to Luna Studio — uploaded, fetched, generated, or imported — goes through validation. Validation determines whether the dataset can be used in a run.

States

Validation has four possible end states:
StateMeaning
Validating…In progress. Usually completes within a few seconds.
ValidatedReady to use.
Validated with warningsUsable, but Luna noticed something worth checking (e.g. mixed casing, partial UTF-8).
Validation error: {message}Unusable until fixed. The dataset is saved in your org but can’t be selected for a run.
The state appears in the run creation flow when Luna Studio checks a selected dataset:
  • The Selected dataset card in Step 2 and Step 3 of the run creation flow.

What Luna checks

Schema checks

  • The file parses as CSV or JSONL.
  • Required columns are present:
    • Test sets — metric-specific feature columns and label.
    • Training sets — metric-specific feature columns and label when the dataset is already labelled.
  • Column types match the metric’s output type (e.g. labels are parseable as Boolean for a Boolean metric).

Content checks

  • File encoding (UTF-8 expected).
  • Row count > 0.
  • Empty rows are flagged as warnings.
  • Inputs that exceed the model’s max token limit are flagged.

File checks (uploads and URLs)

  • File size within the upload limit.
  • For URLs: the URL is reachable; the response content type is appropriate.

Common errors

Error messageCauseFix
Missing column "label" in uploaded CSV.Test set or labelled training set without a label column.Add the column to your file and re-upload. For unlabelled training logs, use the label-only flow from Step 3.
Missing column "input" in uploaded CSV.Required input column missing.Rename your input column to input, or include one.
Could not fetch URL.The URL was unreachable, returned a non-2xx status, or required auth.Check the URL is correct; for cloud URLs, confirm the relevant integration has access.
Unsupported file type.The file isn’t .csv or .jsonl.Convert the file to one of the supported formats.
Empty dataset.The file parsed but had zero data rows.Confirm the file isn’t header-only.
Label values don't match metric output type.Labels can’t be parsed as the metric’s expected type.Re-check labels (e.g. for Boolean: only true/false or 1/0).

Common warnings

WarningCauseAction
Mixed casing in labels.Labels like True and true are mixed.Normalize case in your file. Luna will still validate but may produce a noisier metric.
Some rows exceed the model's max token limit.Long inputs that will be truncated.Either trim the rows or accept the truncation.
Detected non-UTF-8 characters.The file isn’t pure UTF-8.Re-export the file as UTF-8.

Unlabelled training logs

If uploaded or imported training logs are missing labels, Luna Studio does not train on them directly. It opens the label-only generation flow, uses the selected metric prompt to create labels, and saves a labelled training dataset. Generated training sets are always labelled.

When validation fails mid-run

If a dataset that was previously Validated later fails (e.g. the originating Galileo dataset changed), the run that consumed it can fail with a validation error. See Run failed.

Re-validating

Luna validates a dataset once at add-time and once per run launch. There’s no manual “re-validate” button — to re-check a dataset, re-add it (or fetch the URL again).

Where to go next

Add a dataset

Walk through the three sources.

Troubleshooting

Run-time failures and how to recover.