Galileo home pagelight logodark logo
  • Get GenAI Studio
  • Get GenAI Studio
How To
How-To Guide | Galileo Evaluate
Documentation
Client Reference
API Reference
Examples
AI Research
Deployments
Introduction
  • What is Galileo?
Evaluate
  • Overview of Galileo Evaluate
  • Quickstart
  • How To
    • How-To Guide | Galileo Evaluate
    • Create an Evaluation Set
    • Evaluate and Optimize Prompts
    • Evaluate and Optimize RAG Applications
    • Choose your Guardrail Metrics
    • Enabling Scorers in Runs
    • Identify Hallucinations
    • Register Custom Metrics
    • Prompt Management-Storage
    • A/B Compare Prompts
    • Experiment with Multiple Prompts
    • Experiment with Multiple Workflows
    • Evaluate with Human Feedback
    • Add Tags and Metadata to Prompt Runs
    • Log Pre-generated Responses in Python
    • Evaluate and Optimize Agents
    • Programmatically fetch logged data
    • Collaborate with other personas
    • Share a project
    • Export your Evaluation Runs
    • Understanding Metric Values | Galileo Evaluate How-To
    • Logging and Comparing against your Expected Answers
    • Customize Chainpoll-powered Metrics
    • Access Control Guide | Galileo Evaluate
    • Finding the best run
    • Using Datasets
    • Auto-generating an LLM-as-a-judge
    • Customizing your LLM-powered metrics via CLHF
  • Integrations
  • Concepts
  • FAQ
Observe
  • Overview of Galileo Observe
  • Getting Started | Galileo Observe
  • How To
  • Integrations
  • FAQ
Protect
  • Overview of Galileo Protect
  • Quickstart Guide | Galileo Protect
  • How To
  • Integrations
  • Concepts
Galileo Guardrail Metrics
  • Overview of Galileo Guardrail Metrics
  • Metric Definitions
  • FAQ
Fine Tune
  • Overview of Galileo LLM Fine-Tune
  • Quickstart
  • How To
Galileo NLP Studio
  • Training High-Quality Supervised NLP Models | Galileo
  • Use Cases
  • Product Features
  • FAQs
How To

How-To Guide | Galileo Evaluate

Follow step-by-step instructions in Galileo Evaluate to assess generative AI models, configure metrics, and analyze performance effectively.

​
Logging Runs

Log Pre-generated Responses in Python

Experiment with Multiple Chain Workflows

Logging and Comparing Against Your Expected Answers

​
Use Cases

Evaluate and Optimize RAG Applications

Evaluate and Optimize Agents, Chains or Multi-step Workflows

​
Prompt Engineering

Evaluate and Optimize Prompts

Experiment with Multiple Prompts

​
Metrics

Choose your Guardrail Metrics

Enabling Scorers in Runs

Register Custom Metrics

Customize Chainpoll-powered Metrics

​
Getting Insights

Understand Your Metric's Values

A/B Compare Prompts

Evaluate with Human Feedback

Identify Hallucinations

Rank your Runs

​
Collaboration

Share a Project

Collaborate with Other Personas

Export Your Evaluation Runs

​
Advanced Features

Add Tags and Metadata to Prompt Runs

Programmatically Fetch Logged Data

Set up Access Controls

​
Best Practices

Prompt Management & Storage

Create an Evaluation Set

Was this page helpful?

Suggest editsRaise issue
Prompt Engineering From A UICreate an Evaluation Set
Powered by Mintlify
On this page
  • Logging Runs
  • Use Cases
  • Prompt Engineering
  • Metrics
  • Getting Insights
  • Collaboration
  • Advanced Features
  • Best Practices