Use this file to discover all available pages before exploring further.
In the log to Galileo guide, you logged your first trace to Galileo. In this guide, you will evaluate the response from the LLM using the context adherence metric, then improve the prompt, and re-evaluate your application.
Enable the context adherence metric on your Log stream
To evaluate the Log stream against context adherence, you need to turn this on for your Log stream.Add the following import statements to the top of your app file:
from galileo import GalileoMetricsfrom galileo.log_streams import enable_metrics
Next add the following code to your app file. If you are using Python, add this after the call to galileo_context.init(). If you are using TypeScript, add this as the first line in the async block.
This code will enable the context adherence metric for your Log stream, and this metric will then be calculated for all LLM spans that are logged.
2
Run your application
Now that you have metrics turned on for your Log stream, re-run your application to generate another trace. This time the context adherence metric will be calculated.
python app.py
3
Open the Log stream in the Galileo console
In the Galileo console, select your project, then select the Log stream.
4
Select the Traces tab
You can see the trace that was just logged in the Traces tab. The context adherence metric will be calculated, showing low score.
5
Get more information on the evaluation
Select the trace to drill down for more information. Select the LLM span, and use the arrow next to the context adherence score to see an explanation of the metric.
This shows a typical problem with an AI application - the LLM doesn’t have enough relevant context to answer a question correctly, so hallucinates, or uses irrelevant information from its training data. We are after information about Galileo, the AI reliability platform, and want to avoid this irrelevant information about Galileo Galilei.Let’s now fix this by giving the LLM more relevant context, and show the fix with an improved evaluation score.
To improve the context adherence score, you can provide relevant context to the LLM in the system.
1
Add relevant context to your system prompt
To improve the context adherence, you can add relevant context to the system prompt. This is similar to adding extra information from a RAG system.Update your code, replacing the code to set the system prompt with the following:
relevant_documents = [ """ Galileo is the fastest way to ship reliable apps. Galileo brings automation and insight to AI evaluations so you can ship with confidence. """, """ Galileo has Automated evaluations Eliminate 80% of evaluation time by replacing manual reviews with high-accuracy, adaptive metrics. Test your AI features, offline and online, and bring CI/CD rigor to your AI workflows. """, """ Galileo allows Rapid iteration Ship iterations 20% faster by automating testing numerous prompts and models. Find the best performance for any given test set. When something breaks, Galileo helps identify failure modes and root cause. """]system_prompt = f"""You are a helpful assistant that wants to provide a user as much informationas possible. Avoid saying I don't know.Here is some relevant information:{relevant_documents}"""
2
Run your application
Run your application again to log a new trace.
3
View the results in your terminal
Now the results should show relevant information:
Galileo is an advanced platform designed to streamline the development and deployment of reliable AI applications. It focuses on enhancing the efficiency of AI evaluations through automation and insightful metrics. Here are some of the key features and benefits of using Galileo:1. **Automated Evaluations**: Galileo significantly reduces the time spent on manual reviews by automating the evaluation process. This can eliminate up to 80% of evaluation time through the use of high-accuracy, adaptive metrics. Both offline and online testing of AI features are supported, allowing for a more structured and rigorous CI/CD (Continuous Integration/Continuous Delivery) approach within AI workflows.2. **Rapid Iteration**: The platform accelerates the iteration process, enabling teams to ship new features 20% faster. It automates the testing of multiple prompts and models, helping teams quickly identify the best performance for different test sets. When issues arise, Galileo aids in pinpointing failure modes and root causes, which streamlines the troubleshooting process.3. **CI/CD Integration**: By introducing CI/CD rigor to AI workflows, Galileo ensures that AI models undergo continuous testing and improvement, ultimately boosting the quality and reliability of applications being deployed.In summary, Galileo is a powerful tool for teams seeking to enhance their AI app development capabilities by utilizing automation and insightful metrics for evaluations, leading to faster iterations and improved reliability.
4
Check the new trace
A new trace will have been logged. This time, the context adherence score will be higher. Select the trace to see more details.
🎉 Congratulations, you have evaluated a trace, and used the results of the evaluation to improve your AI application.