> ## Documentation Index
> Fetch the complete documentation index at: https://docs.galileo.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Luna-2 Overview

> Discover Galileo's Luna-2 Evaluation model, reducing the latency and cost for metric evaluations

**Luna-2** is the latest generation of our Luna small language models (SLMs), purpose built for scaling AI evaluations. Luna-2 models are fine tuned to provide low latency and reduced costs for metric evaluations. Luna-2 is designed to be further fine tuned for your specific use cases and custom metrics with the goal of providing scalable, real-time, customizable evaluations for enterprises.

Luna-based metrics offer highly accurate and efficient evaluations for AI applications, particularly those with agentic workflows.

<Note>
  Luna-2 is only available in the Enterprise tier of Galileo. [Contact us](https://galileo.ai/contact-sales) to learn more and get started.
</Note>

<CardGroup cols={2}>
  <Card title="Contact us" icon="comment" horizontal href="https://galileo.ai/contact-sales">
    Contact us to learn more about using Luna-2 in your evaluations
  </Card>

  <Card title="Read our research" icon="brain" horizontal href="https://galileo.ai/research">
    Learn how Galileo pushes the envelope on GenAI evaluation with our family of fine tuned small language models.
  </Card>
</CardGroup>

## Overview

LLMs are powerful judges for evaluations, but as your application scales up to from tens or hundreds of traces a day, to thousands or millions, they can fall short. Too often, organizations relying solely on LLMs to act as judges incur major inference costs and don't see the low-latency they need to enable real-time evaluations and runtime protection.

* LLMs are expensive
* LLMs don't provide the performance needed, especially for runtime protection
* LLMs are general purpose, and even leveraging [Autotune](/concepts/metrics/autotune-llm-as-a-judge-metrics) to enhance the evaluation prompts, can still be less effective for your specific needs.

The Luna-2 model mitigates these issues:

* Being an SLM, it is an **order of magnitude cheaper** to run than most LLMs
* SLMs run an **order of magnitude faster**, allowing for **runtime protection**
* Luna-2 is not only **fine-tuned for evaluations**, giving comparable performance out of the box with the top LLMs, but it can be **further fine-tuned using your data** to improve accuracy beyond any general purpose LLM.

The Luna-2 model works with most of the [out of the box metrics](/sdk-api/metrics/metrics#luna-metrics), or your [LLM-as-a-judge custom metrics](/concepts/metrics/custom-metrics/custom-metrics-ui-llm).

## Performance and cost comparison

### Comparison with different LLMs and content safety tools

| Model                | Cost/1M token | Accuracy (F1 score) | Latency (avg) | Max tokens |
| :------------------- | ------------: | ------------------: | ------------: | ---------: |
| **Luna-2**           |    **\$0.02** |            **0.95** |     **152ms** |   **128k** |
| GPT 4o               |        \$2.50 |                0.94 |       3,200ms |       128k |
| GPT 4o mini          |        \$0.60 |                0.90 |       2,600ms |       128k |
| Azure Content Safety |        \$1.52 |                0.62 |         312ms |         3k |

### Latency vs compute requirements

These are the measured latencies for Luna-2 across a range of GPUs for different sized requests.

#### H100/H200 GPU

| Model     | Small (500 tokens) | Medium (2K tokens) | Large (15K tokens) | Extra Large (100K tokens) |
| :-------- | -----------------: | -----------------: | -----------------: | ------------------------: |
| Luna-2 3B |               15ms |               15ms |              141ms |                      2.8s |
| Luna-2 8B |               16ms |               30ms |              277ms |                     4.71s |

#### RTX PRO 6000 GPU

| Model     | Small (500 tokens) | Medium (2K tokens) | Large (15K tokens) | Extra Large (100K tokens) |
| :-------- | -----------------: | -----------------: | -----------------: | ------------------------: |
| Luna-2 3B |               17ms |               32ms |              245ms |                      4.8s |
| Luna-2 8B |               28ms |               61ms |              514ms |                     8.05s |

#### B200 GPU

| Model     | Small (500 tokens) | Medium (2K tokens) | Large (15K tokens) | Extra Large (100K tokens) |
| :-------- | -----------------: | -----------------: | -----------------: | ------------------------: |
| Luna-2 3B |               15ms |               16ms |               81ms |                     1.37s |
| Luna-2 8B |               15ms |               19ms |              146ms |                     2.24s |

#### A100 GPU

| Model     | Small (500 tokens) | Medium (2K tokens) | Large (15K tokens) | Extra Large (100K tokens) |
| :-------- | -----------------: | -----------------: | -----------------: | ------------------------: |
| Luna-2 3B |               27ms |               85ms |              750ms |                     12.5s |
| Luna-2 8B |               51ms |              177ms |              1.51s |                     21.2s |

#### L40S GPU

| Model     | Small (500 tokens) | Medium (2K tokens) | Large (15K tokens) | Extra Large (100K tokens) |
| :-------- | -----------------: | -----------------: | -----------------: | ------------------------: |
| Luna-2 3B |               57ms |               91ms |              491ms |                     8.06s |
| Luna-2 8B |               86ms |              163ms |              1.01s |                    14.03s |

#### L4 GPU

<Note>
  L4 GPUs are only supported for calculating metrics for Log streams and experiments. These GPUs are not supported for runtime protection.
</Note>

| Model     | Small (500 tokens) | Medium (2K tokens) | Large (15K tokens) | Extra Large (100K tokens) |
| :-------- | -----------------: | -----------------: | -----------------: | ------------------------: |
| Luna-2 3B |               51ms |              155ms |              1.66s |                    29.45s |
| Luna-2 8B |              126ms |              364ms |              3.35s |                    50.78s |

<Note>
  The actual latencies can vary a lot based upon the load on the system (Eg: QPS). This can be managed with more GPUs, but the cost will increase.
</Note>

## Technical details

Galileo's Luna-2 metrics utilize fine-tuned Llama models (3B and 8B variants) in evaluating generative AI metrics. The technical process involves:

* **Fine-Tuning:** Base Llama models are fine-tuned with proprietary data for specific metric needs.
* **Classification:** Models output normalized log-probabilities of True/False tokens to determine metric accuracy.
* **Optimized Infrastructure:** Metrics are hosted on Galileo's optimized inference engine with modern GPU hardware for low-latency and cost-effective evaluations. You can also self host on-prem or on your cloud infrastructure.
* **Adapters for Custom Metrics:** Lightweight adapters on a shared base model enhance scalability and minimize infrastructure overhead for additional metrics.

By leveraging fine-tuned Llama models, Luna-2 metrics provide significant enhancements over traditional methods:

<Note>
  Luna evaluation models are fine-tuned on open-source base models, including but not limited to Llama and Mistral. Where applicable, third-party license terms apply — for example, Llama is licensed under the [Meta Llama Community License](https://llama.meta.com/llama3/license), Copyright (c) Meta Platforms, Inc. All Rights Reserved.
</Note>

* **Adaptability:** These models are most effective when fine tuned, requiring approximately 4,000 samples for fine-tuning to customer-specific use cases.
* **Efficiency and Cost-Effectiveness:** Luna-2 models enable simultaneous evaluation of multiple metrics with low latency and reduced costs, ideal for real-time, high-scale deployments.
* **Enhanced Accuracy:** Luna-2 demonstrates at least a 10% accuracy increase compared to traditional BERT-based models, perfect for precise monitoring in production environments.

## Get started with Luna-2

If you are using the enterprise tier of Galileo, follow these steps to use Galileo's Luna-based metrics:

1. [Contact Galileo's customer support or account management](https://galileo.ai/contact-sales) to begin onboarding.
2. If you are using a Galileo-hosted instance, request L4 GPUs or higher, necessary for running Luna-2 models. Otherwise you can deploy to your own infrastructure, using L4 or higher GPUs.
3. Review the provided documentation and model cards for details on latency, accuracy, and comparisons to BERT-based metrics.
4. Provide Galileo with relevant labelled sample data to fine tune the model. We can augment this with synthetic data if needed.
5. Galileo will fine tune your model for you, and deploy it.
6. Set up your experiments and Log streams to use [Luna-based metrics](/sdk-api/metrics/metrics#luna-metrics).

This is not a one-shot process. Your model can be tuned on a regular basis as required.

## Next steps

<CardGroup cols={1}>
  <Card title="Contact us" icon="comment" horizontal href="https://galileo.ai/contact-sales">
    Contact us to learn more about using Luna-2 in your evaluations
  </Card>
</CardGroup>

<CardGroup cols={2}>
  <Card title="Evaluate metrics with the Luna-2 model" icon="moon" horizontal href="/how-to-guides/luna/evaluate-with-luna/evaluate-with-luna">
    Learn how to evaluate metrics cheaper and faster using the Luna-2 model
  </Card>

  <Card title="Use Luna-2 in your experiments" icon="flask" horizontal href="/how-to-guides/luna/experiments-with-luna/experiments-with-luna">
    Learn how to use Luna-2 metrics when running experiments in code
  </Card>
</CardGroup>
