Explore UI-driven prompt engineering in Galileo Evaluate to create, test, and refine prompts with intuitive interfaces and robust evaluation tools.
Quickstart for how to try different templates, models or models settings for an individual LLM call from the Galileo UI.Looking to prompt engineer individual calls to an LLM? Prompt Runs are your answer.A Prompt Run is a quick and easy way to test a model + template + model settings combination for your use case. In order to create a prompt run, you’ll need:
An Evaluation Set - a list of user queries / inputs that you want to run your evaluation over
Galileo offers a comprehensive selection of Guardrail Metrics for monitoring your LLM (Large Language Model) App in production. These metrics are meticulously chosen based on your specific use case, ensuring effective evaluation of your prompts and models. Our Guardrail Metrics encompass:
Industry-Standard Metrics: These include well-known metrics such as BLEU (Bilingual Evaluation Understudy), ROUGE-1 (Recall-Oriented Understudy for Gisting Evaluation), and Perplexity.
Metrics from Galileo’s ML Research Team: Developed through rigorous research, our team has introduced innovative metrics like Uncertainty, Correctness, and Context Adherence. These metrics are designed to evaluate the reliability and authenticity of the generated content, ensuring it meets high standards of safety, accuracy, and relevance.
For detailed information on each metric and how they can be utilized to monitor your LLM App effectively in a production environment, refer to our List of Metrics available through Galileo’s platform.Video Walkthrough of how to get started with Galileo EvaluateThe same workflow can also be executed with the Python client, check out Prompt Engineering with Galileo Evaluate here.