Visual Fidelity

The Visual Fidelity metric is a rule-adherence check: using only visible evidence from the image and the provided brand rules, the evaluator determines whether the image satisfies every applicable rule. The metric is grounded in explicit rule compliance rather than pure aesthetics, prompt reconstruction, or any separate image-quality standard not written in the rules.

To use this metric, you will need to duplicate and edit the prompt to provide your rules in the specified section of the prompt.

Visual Fidelity at a glance

Property	Description
Name	Visual Fidelity
Category	Multimodal Quality
Metric Level	LLM Span
LLM-as-a-judge Support	✅
Luna Support	❌
Protect Runtime Protection	❌
Value Type	Boolean

Score interpretation

Score	Label	Meaning
False	Non-Compliant	One or more applicable provided rules are violated based on visible evidence in the image
True	Compliant	All applicable provided rules pass based on visible evidence in the image

When to use this metric

Example scenario

Provided rules: “Logo must appear in the top-left”, “Primary color must be #E35454”, “No competitor logos”.

The generated banner visibly satisfies each applicable rule (logo placement, correct primary color usage, no prohibited content).

The logo is missing or misplaced, the primary color rule is violated, or prohibited content is present — any single rule violation fails the metric.

Inputs considered

The evaluator examines the following when available:

The generated image produced by the LLM span (output image)
The set of provided brand or content rules that apply to the image

Only rules that are applicable to the generated image are evaluated; inapplicable rules are skipped and do not affect the score. Compliance is determined solely from what is visually observable — the evaluator does not infer intent or reconstruct the original prompt.

Calculation method

Visual Fidelity is computed through a multi-step process:

Rule scoping

Determine which provided rules are applicable to the generated image and should be evaluated.

Visible-evidence evaluation

Using only visible evidence in the image, evaluate each applicable rule as pass or fail. The evaluator does not reconstruct prompts or apply any external image-quality standard.

All-rules decision

Return if and only if all applicable rules pass. Otherwise return .

This metric is typically computed by prompting an LLM with access to the generated image and the provided rules, which may require additional LLM calls to compute and can impact usage and billing.

Best practices

Write observable rules

Each rule should describe something that can be confirmed or denied from visual inspection alone. Avoid rules that require knowledge of the generation prompt or model internals.

Keep rules atomic

Express one constraint per rule so that a failing rule identifies a specific violation rather than a bundle of requirements.

Version your rule sets

Treat brand rule sets as versioned artifacts so changes in guidelines can be tracked and their effect on compliance scores can be measured.

Combine with Instruction Adherence

Use Instruction Adherence to also check if your LLM generating images is following your instructions.

Performance Benchmarks

We evaluated Visual Fidelity against human expert labels on an internal dataset of varied samples using top frontier models.

Model	F1 (True)
GPT-4.1	0.79
Claude Sonnet 4.6	0.76
Gemini 3.1 Flash	0.81

If you would like to dive deeper or start implementing Visual Fidelity, check out the following resources:

Examples

Visual Fidelity Examples - Log in and explore the “Visual Fidelity” Log Stream in the “Preset Metric Examples” Project to see this metric in action.

Instruction Adherence

​Visual Fidelity at a glance

​Score interpretation

​When to use this metric

​Example scenario

​Brand rules compliance for a generated banner

​Inputs considered

​Calculation method

​Best practices

Write observable rules

Keep rules atomic

Version your rule sets

Combine with Instruction Adherence

​Performance Benchmarks

​Related Resources

​Examples

​Related Concepts

Visual Fidelity at a glance

Score interpretation

When to use this metric

Example scenario

Brand rules compliance for a generated banner

Inputs considered

Calculation method

Best practices

Performance Benchmarks

Related Resources

Examples

Related Concepts