To use this metric, you will need to duplicate and edit the prompt to provide your rules in the specified section of the prompt.
Visual Fidelity at a glance
| Property | Description |
|---|---|
| Name | Visual Fidelity |
| Category | Multimodal Quality |
| Metric Level | LLM Span |
| LLM-as-a-judge Support | ✅ |
| Luna Support | ❌ |
| Protect Runtime Protection | ❌ |
| Value Type | Boolean |
Score interpretation
| Score | Label | Meaning |
|---|---|---|
| False | Non-Compliant | One or more applicable provided rules are violated based on visible evidence in the image |
| True | Compliant | All applicable provided rules pass based on visible evidence in the image |
When to use this metric
Example scenario
Brand rules compliance for a generated banner
Provided rules: “Logo must appear in the top-left”, “Primary color must be #E35454”, “No competitor logos”.
The generated banner visibly satisfies each applicable rule (logo placement, correct primary color usage, no prohibited content).
The logo is missing or misplaced, the primary color rule is violated, or prohibited content is present — any single rule violation fails the metric.
Inputs considered
The evaluator examines the following when available:- The generated image produced by the LLM span (output image)
- The set of provided brand or content rules that apply to the image
Calculation method
Visual Fidelity is computed through a multi-step process:Rule scoping
Determine which provided rules are applicable to the generated image and should be evaluated.
Visible-evidence evaluation
Using only visible evidence in the image, evaluate each applicable rule as pass or fail. The evaluator does not reconstruct prompts or apply any external image-quality standard.
This metric is typically computed by prompting an LLM with access to the generated image and the provided rules, which may require additional LLM calls to compute and can impact usage and billing.
Best practices
Write observable rules
Each rule should describe something that can be confirmed or denied from visual inspection alone. Avoid rules that require knowledge of the generation prompt or model internals.
Keep rules atomic
Express one constraint per rule so that a failing rule identifies a specific violation rather than a bundle of requirements.
Version your rule sets
Treat brand rule sets as versioned artifacts so changes in guidelines can be tracked and their effect on compliance scores can be measured.
Combine with Instruction Adherence
Use Instruction Adherence to also check if your LLM generating images is following your instructions.
Performance Benchmarks
We evaluated Visual Fidelity against human expert labels on an internal dataset of varied samples using top frontier models.| Model | F1 (True) |
|---|---|
| GPT-4.1 | 0.79 |
| Claude Sonnet 4.6 | 0.76 |
| Gemini 3.1 Flash | 0.81 |
Related Resources
If you would like to dive deeper or start implementing Visual Fidelity, check out the following resources:Examples
- Visual Fidelity Examples - Log in and explore the “Visual Fidelity” Log Stream in the “Preset Metric Examples” Project to see this metric in action.