metric

BuiltInMetrics

Provides convenient access to built-in Galileo metrics (formerly “scorers”). Examples

from galileo.metric import Metric

# Access built-in metrics
Metric.metrics.correctness
Metric.metrics.completeness
Metric.metrics.toxicity

Base class for all Galileo metrics. This is an abstract base class that defines common attributes and methods for all metric types. Use one of the concrete metric classes instead:

GalileoMetric: Built-in Galileo scorers (access via Metric.scorers)
LlmMetric: Custom LLM-based metrics with prompt templates
LocalMetric: Local function-based metrics
CodeMetric: Code-based metrics (future support)

Common Attributes

id (str | None): The unique metric identifier (UUID). name (str): The metric name. scorer_type (ScorerTypes | None): The type of scorer. description (str): Description of the metric. tags (list[str]): Tags associated with the metric. created_at (datetime | None): When the metric was created. updated_at (datetime | None): When the metric was last updated. version (int | None): Metric version number.

Class Attributes

metrics (BuiltInMetrics): Access built-in Galileo metrics. Examples

# 1. Use built-in Galileo scorers
from galileo import Metric, GalileoMetric, LlmMetric, LocalMetric, LogStream

log_stream = LogStream.get(name="my-stream", project_name="my-project")
log_stream.set_metrics([
    Metric.metrics.correctness,
    Metric.metrics.completeness,
])

# 2. Create custom LLM metric
llm_metric = LlmMetric(
    name="response_quality",
    prompt="Rate the quality...",
    model="gpt-4o-mini",
    judges=3,
).create()

# 3. Create local function-based metric
def my_scorer(trace_or_span):
    return 0.5

local_metric = LocalMetric(
    name="response_length",
    scorer_fn=my_scorer,
)

delete

def delete(self) -> None

Delete this metric. Only works for server-side metrics. Local metrics don’t need deletion. Examples

metric = Metric.get(name="factuality-checker")
metric.delete()

delete_by_name

def delete_by_name(cls, name: str) -> None

Delete a metric by name without retrieving it first. This is more efficient than calling Metric.get(name=...).delete() when you only need to delete and don’t need the metric object. Arguments

name: The name of the metric to delete.

get

def get(cls, *, id: str | None=None, name: str | None=None) -> Metric | None

Get an existing metric by ID or name. Returns the appropriate subclass instance based on scorer_type. Arguments

id: The metric ID (UUID).
name: The metric name.

list

def list(cls,
         *,
         name_filter: str | None=None,
         scorer_types: list[ScorerTypes] | None=None) -> builtins.list[Metric]

List metrics with optional filtering. Returns appropriate subclass instances based on scorer_type. Arguments

name_filter: Filter metrics by exact name match.
scorer_types: Filter by scorer types.

refresh

def refresh(self) -> None

Refresh this metric’s state from the API. Updates all attributes with the latest values from the remote API. Examples

metric.refresh()
assert metric.is_synced()

to_legacy_metric

def to_legacy_metric(self) -> LegacyMetric

Convert to legacy galileo.schema.metrics.Metric format. This enables backward compatibility with existing code that uses the legacy Metric class. Examples

metric = Metric.get(name="my-metric")
legacy = metric.to_legacy_metric()
# Use with existing APIs

update

def update(self, **kwargs: Any) -> Metric

Update this metric’s properties on the API. Only name, description, and tags can be updated via this method. On success the instance is updated with the API response and returned in SYNCED state. Arguments

**kwargs (Any): Fields to update. Supported keys: name, description, tags.

Examples

metric = Metric.get(name="factuality-checker")
metric.update(name="new-name", description="Updated description")
assert metric.is_synced()

LlmMetric

LLM-based metric with custom prompt templates. This metric type allows you to create custom metrics evaluated by an LLM judge using a prompt template. Arguments

Configuration:
-------------: Default values for model and judges can be configured via:
- Configuration.default_scorer_model (env: GALILEO_DEFAULT_SCORER_MODEL)
- Configuration.default_scorer_judges (env: GALILEO_DEFAULT_SCORER_JUDGES)

Examples

# Create custom LLM metric with string model name
metric = LlmMetric(
    name="response_quality",
    prompt='''
    Rate the quality of this response on a scale of 1-10.

    Question: {input}
    Answer: {output}

    Return only the numerical score (1-10).
    ''',
    model="gpt-4o-mini",  # String model name
    judges=3,
    node_level=StepType.llm,
    description="Rates response quality",
    tags=["quality", "custom"],
    output_type=OutputTypeEnum.PERCENTAGE,
    cot_enabled=True,
).create()

# Or use a Model object from Integration
from galileo.integration import Integration
gpt_model = Integration.openai.get_model(alias="gpt-4o-mini")
metric = LlmMetric(
    name="response_quality",
    prompt="Rate quality 1-10: {input} -> {output}",
    model=gpt_model,  # Model object
    judges=3,
).create()

create

def create(self) -> LlmMetric

Persist this LLM metric to the API. Examples

metric = LlmMetric(
    name="quality_check",
    prompt="Rate the quality...",
    model="gpt-4o-mini"
).create()
assert metric.is_synced()

CodeMetric

Code-based metric. This metric type is for code-based scorers that execute custom code to evaluate traces/spans. Examples

# Get existing code metric
metric = Metric.get(name="my-code-metric")
assert isinstance(metric, CodeMetric)

# Create code metric with inline code
metric = CodeMetric(
    name="custom_code_scorer",
    code="def scorer_fn(step_object):\\n    return 1.0",
    description="Custom code-based scorer",
    tags=["custom", "code"],
    node_level=StepType.llm,
).create()

# Load code from file
metric = CodeMetric(
    name="custom_code_scorer",
    node_level=StepType.llm,
).load_code("./scorers/my_scorer.py").create()

create

def create(self) -> CodeMetric

Persist this Code metric to the API. This method validates the code first by submitting it to the validation endpoint, polling for the result, and then creating the scorer with the validated result. Examples

# Create with inline code
metric = CodeMetric(
    name="custom_code_scorer",
    code="def scorer_fn(step_object):\\n    return 1.0",
    node_level=StepType.llm,
).create()
assert metric.is_synced()

# Create by loading from file
metric = CodeMetric(
    name="custom_code_scorer",
    node_level=StepType.llm,
).load_code("./scorers/my_scorer.py").create()
assert metric.is_synced()

load_code

def load_code(self, code_file_path: str) -> CodeMetric

Load code from a file into this metric instance. Arguments

code_file_path: Path to the Python file containing the scorer code.

GalileoMetric

Built-in Galileo scorer metric. This metric type represents Galileo’s built-in scorers like correctness, completeness, toxicity, etc. Access these via Metric.metrics. Examples

# Access built-in scorers
from galileo import Metric, LogStream

log_stream = LogStream.get(name="my-stream", project_name="my-project")
log_stream.set_metrics([
    Metric.metrics.correctness,
    Metric.metrics.completeness,
    Metric.metrics.toxicity,
])

# Or get by name
metric = Metric.get(name="correctness")
assert isinstance(metric, GalileoMetric)

LocalMetric

Local function-based metric. This metric type uses a Python function to score traces/spans locally without making API calls. Useful for simple, deterministic metrics. Examples

# Create local function-based metric
def response_length_scorer(trace_or_span):
    if hasattr(trace_or_span, "output") and trace_or_span.output:
        return min(len(trace_or_span.output) / 100.0, 1.0)
    return 0.0

local_metric = LocalMetric(
    name="response_length",
    scorer_fn=response_length_scorer,
    scorable_types=[StepType.llm],
    aggregatable_types=[StepType.trace],
)

# Use with log stream
log_stream.set_metrics([local_metric])

to_local_metric_config

def to_local_metric_config(self) -> LocalMetricConfig

Convert to LocalMetricConfig format. Examples

def my_scorer(trace):
    return 0.5

metric = LocalMetric(name="test", scorer_fn=my_scorer)
config = metric.to_local_metric_config()

​BuiltInMetrics

​Metric

​Common Attributes

​Class Attributes

​delete

​delete_by_name

​get

​list

​refresh

​to_legacy_metric

​update

​LlmMetric

​create

​CodeMetric

​create

​load_code

​GalileoMetric

​LocalMetric

​to_local_metric_config

BuiltInMetrics

Metric

Common Attributes

Class Attributes

delete

delete_by_name

get

list

refresh

to_legacy_metric

update

LlmMetric

create

CodeMetric

create

load_code

GalileoMetric

LocalMetric

to_local_metric_config