Luna Studio is Galileo’s web app for fine-tuning custom evaluation metrics for LLM applications. You bring a small labelled test set, optionally generate a training set, fine-tune a Luna base model, and register the resulting metric back into the Galileo metrics store. See Welcome for the longer pitch.
Who is it for?
ML and AI engineers who need evaluation metrics tailored to a specific domain (legal, healthcare, RAG over internal docs, etc.) and don’t want to write fine-tuning code from scratch.
How is Luna Studio different from Galileo?
Galileo is the broader platform — evaluation, observability, guardrails. Luna Studio is the metric-fine-tuning workspace inside Galileo. Metrics produced in Luna Studio are registered to the Galileo metrics store, where they’re usable across the rest of the platform.
How do I get Luna Studio for my org?
Luna Studio is part of the enterprise tier of Galileo and is deployed by Galileo into your own cluster or cloud. See Availability and deployment, or contact us to get started.
Use at least 300 hand-labelled rows when possible. Beyond a few hundred rows, you start paying inference cost without much added signal for most metrics. Exact requirements can vary by metric and are checked during validation.
Do I have to upload a training set?
No. The most common path is Generate from test set — Luna Studio synthesizes a 2,000-example training set from 20% of your test set. See Step 3.Upload your own training set when you have labelled production logs that better represent the distribution you want to evaluate.
Can my training set be unlabelled?
Yes, for uploaded or imported logs. Luna Studio labels unlabelled logs with your LLM-as-judge prompt first, saves the labelled result as a training dataset, and then uses that labelled dataset for training. Generated training sets are always labelled.
Can I reuse a test set across multiple projects?
Yes. Datasets are org-scoped, not project-scoped. Once you’ve added a test set, every project in your org can use it.
What file formats are supported?
.csv and .jsonl. Both must include an input column; test sets and labelled training sets also need a label column. See Add a dataset.
What's the difference between a predefined metric and a custom metric?
Predefined metrics use battle-tested LLM-as-judge prompts curated by Galileo (e.g. Toxicity, Context adherence). Custom metrics let you write your own prompt. Both fine-tune the same way.
Which output type should I pick?
Pick the simplest trainable type that captures what you need: Boolean for yes/no questions, or Categorical for picking from a fixed list. Other Galileo metric output types are not trainable in Luna Studio yet.
What's a 'step' on a metric?
The step is which part of a Galileo trace the metric runs against: a single LLM call (LLM span), a retrieval step (Retriever), an agent step (Agent span), or a trace-level input. See Custom
metrics.
Can I edit a registered metric?
No. Once registered, the metric is snapshotted in the Galileo metrics store. To iterate, launch a new run with the same metric template and register it under a new name (or unregister the old one in Galileo first).
Use the base model configured for your organization. Luna Studio loads available base models from your deployment and shows the selected model in Step 4.
How long does training take?
Depends on the base model and training set size. Most runs take a few hours, and larger models or larger datasets can take longer.
Can I cancel a training run?
Not today. Once a run is Training, it runs to completion or failure. We’re tracking this for a future release.
You can’t re-enter the onboarding wizard, but you can do the same things from the main app: add LLM providers from the Integrations page, and create projects from the Projects page.
Can I have multiple projects?
Yes. Most teams have one project per metric or per application. See Projects overview.