Skip to main content
If your question isn’t answered here, check Troubleshooting for runtime issues.

General

Luna Studio is Galileo’s web app for fine-tuning custom evaluation metrics for LLM applications. You bring a small labelled test set, optionally generate a training set, fine-tune a Luna base model, and register the resulting metric back into the Galileo metrics store. See Welcome for the longer pitch.
Data scientists, ML engineers, and AI engineers who need evaluation metrics tailored to a specific domain (legal, healthcare, RAG over internal docs, etc.). Use the Luna Studio UI for a guided, no-code workflow. Use the Luna Studio SDK when you want more control or need to run fine-tuning on your own infrastructure.
Galileo is the broader platform — evaluation, observability, guardrails. Luna Studio is the metric-fine-tuning workspace inside Galileo. Metrics produced in Luna Studio are registered to the Galileo metrics store, where they’re usable across the rest of the platform.
Luna Studio is part of the enterprise tier of Galileo and is deployed by Galileo into your own cluster or cloud. See Availability and deployment, or contact us to get started.

Test sets and training sets

Ideally a test should consist of 1000-3000 samples with a good distribution across the classes. We enforce a minimum of 300 human-labelled rows with at least 100 samples per class, since quality is more important than quantity, and human labelling isn’t cheap.
Not necessarily. If you have a training set, then sure, but if not, you can choose to generate data using Generate from test set — Luna Studio synthetically generates a training set from 20% of your test set. See Step 3.Upload your own training set when you have labelled production logs that better represent the distribution you want to evaluate.
Yes, for uploaded or imported logs. Luna Studio labels unlabelled logs with your LLM-as-judge prompt first, saves the labelled result as a training dataset, and then uses that labelled dataset for training. Generated training sets are always labelled.
Yes. Datasets are org-scoped, not project-scoped. Once you’ve added a test set, every project in your org can use it.
Currently we support .csv and .jsonl formats. See Add a dataset.

Metrics

Predefined metrics use battle-tested LLM-as-judge prompts curated by Galileo (e.g. Toxicity, Context adherence). Custom metrics let you write your own prompt. Both fine-tune the same way.
The step is which part of a Galileo trace the metric runs against: a single LLM call (LLM span), a retrieval step (Retriever), an agent step (Agent span), or a trace-level input. See Custom prompts in Step 1.
No. Once registered, the metric is snapshotted in the Galileo metrics store. To iterate, launch a new run with the same metric template and register it under a new name (or unregister the old one in Galileo first).

Training

Depends on the base model, training set size and GPU availability. Most runs take a few hours, and larger models or larger datasets can take longer.
Not today. Once a run is Training, it runs to completion or failure. We’re tracking this for a future release.

Integrations

Luna Studio supports named hosted providers, Azure, Vertex AI, AWS-hosted models, custom model setups, and Galileo. See Integrations overview.
For in-house models, OpenAI-compatible proxies, or providers that aren’t covered by the named integrations. See Custom models and proxies.

Where to go next

Troubleshooting

Runtime errors and how to recover.

Quickstart

End-to-end walkthrough.