Agentic metrics help you measure how well your AI agents perform complex, multi-step tasks—especially when those agents need to use tools, make decisions, or interact with external systems. These metrics and helpful for those for anyone building advanced AI assistants, workflow automation, or any system where the AI acts on behalf of a user. Use agentic metrics when you want to:Documentation Index
Fetch the complete documentation index at: https://docs.galileo.ai/llms.txt
Use this file to discover all available pages before exploring further.
- Track whether your agent is making meaningful progress toward its goals.
- Detect and diagnose errors that occur when your agent uses tools or APIs.
- Ensure your agent is choosing the best tools or actions for each situation.
| Name | Description | Supported Nodes | When to Use | Example Use Case |
|---|---|---|---|---|
| Action advancement | Measures how effectively each action advances toward the goal. | Trace | When assessing whether an agent is making meaningful progress in multi-step tasks. | A travel planning agent that needs to book flights, hotels, and activities in the correct sequence. |
| Action completion | Determines whether the agent successfully accomplished all of the user’s goals. | Session | To assess whether an agent completed the desired goal. | A coding agent that is seeking to close engineering tickets. |
| Agent efficiency | Determines if an agent provides a precise answer or resolution to every user ask, with an efficient path. | Session | To assess if an agent is taking the most efficient path to a solution. | A complex multi-agent chatbot that needs a fast response. |
| Agent flow | Measures the correctness and coherence of an agentic trajectory by validating it against user-specified natural language tests. | Session | To assess a multi-agent system, or a system with multiple tools. | An internal process agent that needs to follow strict process rules. |
| Conversation quality | A binary metric that assesses whether a chatbot interaction left the user feeling satisfied and positive or frustrated and dissatisfied. | Session (trace inputs/outputs only) | When building customer facing chatbots. | A health insurance chatbot. |
| Tool error | Detects errors or failures during the execution of tools. | Tool span | When implementing AI agents that use tools and want to track error rates. | A coding assistant that uses external APIs to run code and must handle and report execution errors appropriately. |
| Tool selection quality | Evaluates whether the agent selected the most appropriate tools for the task. | LLM span | When optimizing agent systems for effective tool usage. | A data analysis agent that must choose the right visualization or statistical method based on the data type and user question. |
| Reasoning Coherence | Assesses whether an agent’s reasoning steps are logically consistent and aligned with its plan. | LLM span | When validating multi-step planning and intermediate reasoning quality. | A planning agent that must follow a coherent plan across tool calls. |
| User Intent change | Measures a significant shift in the user’s primary conversational goal or workflow during a session, relative to their initial stated intent. | Session (trace inputs/outputs only) | To analyze a holistic view across an entire user session to understand what capabilities a user interacts with in a single session. | A multi-purpose chatbot for a bank. |