> ## Documentation Index
> Fetch the complete documentation index at: https://docs.galileo.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Eval Engineering for AI Developers

> Learn about eval engineering in our free 5-part course

**Eval engineering for AI developers** is a 5-part course run as a series of live streams, hosted by [Jim Bennett](https://linktr.ee/jimbobbennett), Principal Developer Advocate at Galileo.

90% of AI agents don't make it successfully to production. The biggest reason is the AI engineers building these apps don't have a clear way of evaluating that these agents are doing what they should do, and using the results of this evaluation to fix them.

In this course, you will learn all about evals for AI applications. You'll start with some out-of-the-box metrics and learn about evals, then move onto understanding observability for AI apps, analyzing failure states, defining custom metrics, then finally using these across your whole SDLC.

This is hands on, so be prepared to write some code, create some metrics, and do some homework!

## Prerequisites

* A basic knowledge of Python, and Python 3.10 or higher installed
* An [OpenAI API key](https://openai.com/api/). Other LLMs are supported, but the code samples use the OpenAI SDK.
* A [Galileo account](https://app.galileo.ai/sign-up). The free account is fine for these lessons.
* A clone or download of the [Eval engineering GitHub repo](https://github.com/rungalileo/eval-engineering)

## Lessons

### Lesson 1 - Hello Evals

In this first lesson, you will

* Learn what evals are
* Learn how you can use simple evals to detect issues in an AI application
* Get hands on adding an eval to an app

<iframe className="w-full aspect-video rounded-xl" src="https://www.youtube.com/embed/HnDnMFUTj2Y" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowFullScreen />

### Lesson 2 - Observability in AI apps

In this second lesson, you will

* Use observability to visualize the components of a typical multi-agent AI application
* Learn about the different components that make up these applications
* Apply some out-of-the-box metrics to start to get an understanding of how your application is working

<iframe className="w-full aspect-video rounded-xl" src="https://www.youtube.com/embed/aZz3ncrafRw" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowFullScreen />

### Lesson 3 - Failure analysis

In this third lesson, you will

* Learn the process for finding failures in your AI applications
* Build out rubrics for identifying failure cases
* Learn how to group failure cases to themes that can be used for building evals

<iframe className="w-full aspect-video rounded-xl" src="https://www.youtube.com/embed/lCAfEF_qQgQ" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowFullScreen />

### Lesson 4 - Build custom metrics

In this fourth lesson, you will

* Build datasets of known inputs and outputs for cases that pass and fail
* Learn how to build custom metrics for your failure cases
* Determine the success of your metrics by measuring true and false positives and negatives

<iframe className="w-full aspect-video rounded-xl" src="https://www.youtube.com/embed/I9jSJ5_MobA" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowFullScreen />

### Lesson 5 - Eval engineering in your SDLC

In this final lesson, you will

* Learn how evals fit into the SDLC
* Build unit tests using evals that can be run in your CI/CD pipeline
* Learn about using evals as guardrails at runtime
* Add observability and alerts to detect when your application is failing

<iframe className="w-full aspect-video rounded-xl" src="https://www.youtube.com/embed/RMF8tT-wo8U?" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowFullScreen />

## Course materials

All the course materials are available on the Galileo GitHub.

<CardGroup cols={2}>
  <Card title="eval-engineering repo on GitHub" icon="github" horizontal href="https://github.com/rungalileo/eval-engineering" />
</CardGroup>
