Agentic Metrics

Agentic metrics help you measure how well your AI agents perform complex, multi-step tasks—especially when those agents need to use tools, make decisions, or interact with external systems. These metrics and helpful for those for anyone building advanced AI assistants, workflow automation, or any system where the AI acts on behalf of a user. Use agentic metrics when you want to:

Track whether your agent is making meaningful progress toward its goals.
Detect and diagnose errors that occur when your agent uses tools or APIs.
Ensure your agent is choosing the best tools or actions for each situation.

Below is a quick reference table of all agentic performance metrics:

Name	Description	Supported Nodes	When to Use	Example Use Case
Action advancement	Measures how effectively each action advances toward the goal.	Trace	When assessing whether an agent is making meaningful progress in multi-step tasks.	A travel planning agent that needs to book flights, hotels, and activities in the correct sequence.
Action completion	Determines whether the agent successfully accomplished all of the user’s goals.	Session	To assess whether an agent completed the desired goal.	A coding agent that is seeking to close engineering tickets.
Agent efficiency	Determines if an agent provides a precise answer or resolution to every user ask, with an efficient path.	Session	To assess if an agent is taking the most efficient path to a solution.	A complex multi-agent chatbot that needs a fast response.
Agent flow	Measures the correctness and coherence of an agentic trajectory by validating it against user-specified natural language tests.	Session	To assess a multi-agent system, or a system with multiple tools.	An internal process agent that needs to follow strict process rules.
Conversation quality	A binary metric that assesses whether a chatbot interaction left the user feeling satisfied and positive or frustrated and dissatisfied.	Session (trace inputs/outputs only)	When building customer facing chatbots.	A health insurance chatbot.
Tool error	Detects errors or failures during the execution of tools.	Tool span	When implementing AI agents that use tools and want to track error rates.	A coding assistant that uses external APIs to run code and must handle and report execution errors appropriately.
Tool selection quality	Evaluates whether the agent selected the most appropriate tools for the task.	LLM span	When optimizing agent systems for effective tool usage.	A data analysis agent that must choose the right visualization or statistical method based on the data type and user question.
Reasoning Coherence	Assesses whether an agent’s reasoning steps are logically consistent and aligned with its plan.	LLM span	When validating multi-step planning and intermediate reasoning quality.	A planning agent that must follow a coherent plan across tool calls.
User Intent change	Measures a significant shift in the user’s primary conversational goal or workflow during a session, relative to their initial stated intent.	Session (trace inputs/outputs only)	To analyze a holistic view across an entire user session to understand what capabilities a user interacts with in a single session.	A multi-purpose chatbot for a bank.

​Next steps

Next steps