This page is the conceptual backbone of Luna Studio. If you’re new to the product, skim it once before working through the Quickstart. The vocabulary here is reused across every other page.Documentation Index
Fetch the complete documentation index at: https://docs.galileo.ai/llms.txt
Use this file to discover all available pages before exploring further.
The mental model
A project holds a series of training runs that explore variations of a preset or custom metric. Each run combines a test set, a training set, and a base model to produce a fine-tuned metric. Once a run is Fine-tuned, you can register the metric back into the Galileo metrics store.Projects
A project is the top-level container. It groups training runs that share a goal — typically tuning multiple metrics for a single application or domain. This also closely maps to the concept of a project in Galileo. Examples of well-scoped projects:customer-support-copilot— improve an assistant that helps support teams draft accurate, on-brand responses.enterprise-search-assistant— improve a RAG-style assistant that answers employee questions from internal knowledge sources.sales-engineering-assistant— improve an assistant that helps teams respond to requests for proposals, questionnaires, and technical buyer questions.
Training runs
A training run is a single attempt at fine-tuning a metric. Each run captures four inputs:| Input | What it is |
|---|---|
| Metric | A predefined metric template (e.g. Toxicity) or a custom prompt you wrote. |
| Test set | A small labelled dataset used for data generation (20%) and evaluation (80%) at the end of the run. |
| Training set | A larger labelled dataset used to fine-tune the base model. Often generated from the test set. |
| Base model | The Luna model configured for your organization, shown in the run summary before launch. |
Metrics
A metric is a function that takes some part of an LLM trace and returns a score.Output types
Luna Studio currently supports two output types:- Boolean — true/false (e.g. “is this toxic?”).
- Categorical — one of a fixed set of labels.
Metric steps and input shape
Each metric has two related settings:- Step — where the metric runs in a Galileo trace: LLM span, Retriever, Agent span, or Trace.
- Input step — what each training row contains: a single message, an input / output pair, a full trace, or a full session.
Datasets
Luna Studio splits datasets into two flavors:Test sets
A test set is a small, hand-labelled dataset used to evaluate a fine-tuned metric. Test sets are the “ground truth” for the run — they should be labelled carefully and should not be seen during training. Required columns depend on the metric’s input type. See Prerequisites for the full list.Training sets
A training set is the dataset used to fine-tune the Luna metric. Training sets can be:- Generated from a test set — Luna Studio uses ~20% of the test set as seed examples, samples 50 rows for review, and generates 2,000 labelled examples with the LLM-as-judge prompt and synthetic data generation pipeline.
- Uploaded — your own labelled production logs (CSV or JSONL). If logs are unlabelled, Luna Studio labels them first before using them for training.
- Imported from Galileo — pulled from a project in your connected Galileo workspace.
Base models
Luna Studio fine-tunes the Luna base model selected for your run. The available model list is configured by your Luna Studio deployment, so your workspace may show different options. Confirm the base model in Step 4 — Config and launch of the run creation flow. For accuracy benchmarks, GPU latency tables, and the underlying SLM architecture behind these base models, see the Luna-2 overview.Integrations
To run anything, Luna Studio needs provider credentials for the services it talks to:- For metric generation (Step 3 — Training set): the credentials for whichever configured provider and model you select in the Generate drawer.
- For Galileo features (Import from Galileo, Register metric): a Galileo API key.
Lifecycle statuses
The same five statuses appear on runs and metrics throughout the app:- Queued — accepted but not yet started.
- Training — fine-tuning is in progress.
- Fine-tuned — training succeeded; not yet registered.
- Registered — the metric is live in the Galileo metrics store.
- Failed — training or registration failed; see the run details for the reason.
Where to go next
Quickstart
A 15-minute, end-to-end tour of the product.
New run deep dive
Step-by-step reference for the four-step run creation flow.
FAQ
Common questions about choosing test sets, picking models, and more.