Core concepts - Galileo

This page is the conceptual backbone of Luna Studio. If you’re new to the product, skim it once before working through the Quickstart. The vocabulary here is reused across every other page.

The mental model

A project holds a series of training runs that explore variations of a preset or custom metric. Each run combines a test set, a training set, and a base model to produce a fine-tuned metric. Once a run is Fine-tuned, you can register the metric back into the Galileo metrics store.

Projects

A project is the top-level container. It groups training runs that share a goal — typically tuning multiple metrics for a single application or domain. This also closely maps to the concept of a project in Galileo. Examples of well-scoped projects:

customer-support-copilot — improve an assistant that helps support teams draft accurate, on-brand responses.
enterprise-search-assistant — improve a RAG-style assistant that answers employee questions from internal knowledge sources.
sales-engineering-assistant — improve an assistant that helps teams respond to requests for proposals, questionnaires, and technical buyer questions.

You can create as many projects as you want. Projects show on the Projects page. See Projects overview.

Training runs

A training run is a single attempt at fine-tuning a metric. Each run captures four inputs:

Input	What it is
Metric	A predefined metric template (e.g. Toxicity) or a custom prompt you wrote.
Test set	A small labelled dataset used for data generation (20%) and evaluation (80%) at the end of the run.
Training set	A larger labelled dataset used to fine-tune the base model. Often generated from the test set.
Base model	The Luna model configured for your organization, shown in the run summary before launch.

Runs have a lifecycle status: Queued → Training → Fine-tuned → Registered. A run can also fail (status: Failed). See Run lifecycle for the full state machine.

Metrics

A metric is a function that takes some part of an LLM trace and returns a score.

Output types

Luna Studio currently supports two output types:

Boolean — true/false (e.g. “is this toxic?”).
Categorical — one of a fixed set of labels.

Other output types, such as floating-point, percentage, multilabel, and numeric metrics, can appear in Galileo but are not trainable in Luna Studio yet.

Metric steps and input shape

Each metric has two related settings:

Step — where the metric runs in a Galileo trace: LLM span, Retriever, Agent span, or Trace.
Input step — what each training row contains: a single message, an input / output pair, a full trace, or a full session.

Full trace and full session inputs need user-supplied training data; Luna Studio does not generate synthetic training data for those shapes. Multimodal support - Today, Luna Studio metrics operate over the Text modality only. A metric flows through the same lifecycle as the run that produced it: Queued, Training, Fine-tuned, Registered, Failed. See Metrics overview.

Datasets

Luna Studio splits datasets into two flavors:

Test sets

A test set is a small, hand-labelled dataset used to evaluate a fine-tuned metric. Test sets are the “ground truth” for the run — they should be labelled carefully and should not be seen during training. Required columns depend on the metric’s input type. See Prerequisites for the full list.

Training sets

A training set is the dataset used to fine-tune the Luna metric. Training sets can be:

Generated from a test set — Luna Studio uses ~20% of the test set as seed examples, samples 50 rows for review, and generates 2,000 labelled examples with the LLM-as-judge prompt and synthetic data generation pipeline.
Uploaded — your own labelled production logs (CSV or JSONL). If logs are unlabelled, Luna Studio labels them first before using them for training.
Imported from Galileo — pulled from a project in your connected Galileo workspace.

See Datasets overview.

Base models

Luna Studio fine-tunes the Luna base model selected for your run. The available model list is configured by your Luna Studio deployment, so your workspace may show different options. Confirm the base model in Step 4 — Config and launch of the run creation flow. For accuracy benchmarks, GPU latency tables, and the underlying SLM architecture behind these base models, see the Luna-2 overview.

Integrations

To run anything, Luna Studio needs provider credentials for the services it talks to:

For metric generation (Step 3 — Training set): the credentials for whichever configured provider and model you select in the Generate drawer.
For Galileo features (Import from Galileo, Register metric): a Galileo API key.

Integrations are org-scoped — once added, every member of your org and every project can use them. See Integrations overview. Luna Studio can also be deployed against different training platforms, including Vertex AI Pipelines, AzureML Pipelines, SageMaker Pipelines, and Kubernetes. Those deployment-level integrations are fixed when the application is deployed, rather than configured by end users at runtime. For the full picture, see Availability and deployment.

Lifecycle statuses

The same five statuses appear on runs and metrics throughout the app:

Queued — accepted but not yet started.
Training — fine-tuning is in progress.
Fine-tuned — training succeeded; not yet registered.
Registered — the metric is live in the Galileo metrics store.
Failed — training or registration failed; see the run details for the reason.

See Run lifecycle for the full state machine and what each status means.

Where to go next

Quickstart

A 15-minute, end-to-end tour of the product.

New run deep dive

Step-by-step reference for the four-step run creation flow.

FAQ

Common questions about choosing test sets, picking models, and more.

Documentation Index

​The mental model

​Projects

​Training runs

​Metrics

​Output types

​Metric steps and input shape

​Datasets

​Test sets

​Training sets

​Base models

​Integrations

​Lifecycle statuses

​Where to go next