Precision	Recall	F1-Score
{negativeLabel}	{negClass.precision.toFixed(2)}	{negClass.recall.toFixed(2)}	{negClass.f1.toFixed(2)}
{positiveLabel}	{posClass.precision.toFixed(2)}	{posClass.recall.toFixed(2)}	{posClass.f1.toFixed(2)}

{}

{titlePrefix}Confusion Matrix (Normalized)

{}

Predicted

{}

{displayPredictedLabels.left}

{displayPredictedLabels.right}

{}

Actual

{displayActualLabels.top}

{showCounts &&

{displayMatrix.tl.count}

}

{formatValue(displayMatrix.tl.pct)}

{showCounts &&

{displayMatrix.tr.count}

}

{formatValue(displayMatrix.tr.pct)}

{}

{displayActualLabels.bottom}

{showCounts &&

{displayMatrix.bl.count}

}

{formatValue(displayMatrix.bl.pct)}

{showCounts &&

{displayMatrix.br.count}

}

{formatValue(displayMatrix.br.pct)}

{}

{displayFormat === "fraction" ? "0.0" : "0%"}

{palette.map((color, idx) =>

)}

{displayFormat === "fraction" ? "1.0" : "100%"}

; }; export const DefinitionCard = ({children}) => { return

{children}

; }; export const Scale = ({low, mid, high, lowLabel = "Low", midLabel = "Mid", highLabel = "High", lowDescription, midDescription, highDescription, midColor = "yellow", inverted = false}) => { const lowColor = inverted ? "green" : "red"; const highColor = inverted ? "red" : "green"; const gradientId = inverted ? "greenToRed" : "redToGreen"; return

{low}

{mid &&

{mid}

}

{high}

{lowLabel}

{lowDescription &&

{lowDescription}

}

{mid &&

{midLabel}

{midDescription &&

{midDescription}

}

{highLabel}

{highDescription &&

{highDescription}

}

; }; ## Overview Action Advancement measures whether an assistant successfully accomplishes or makes progress toward at least one user goal in a conversation. Action Advancement addresses the common pain points of unclear agent performance by measuring whether AI agents are actually helping users achieve their objectives rather than just providing responses. An assistant successfully advances a user's goal when it: 1. Provides a complete or partial answer to the user's question 2. Requests clarification or additional information to better understand the user's needs 3. Confirms that a requested action has been successfully completed For an interaction to count as advancing the user's goal, the assistant's response must be: * Factually accurate * Directly addressing the user's request * Consistent with any tool outputs used ### Action Advancement at a glance | Property | Description | | :----------------------------- | :---------------------------------------------------------------------------- | | **Name of Metric** | Action Advancement | | **Metric Category** | Agentic Metrics | | **Use this metric for** | Evaluating whether AI agents make progress toward user goals in conversations | | **Can be applied to** | session, trace, all span types (agent, workflow, retriever, LLM and tool) | | **LLM/Luna Support** | Supported with both LLM + Luna models | | **Protect Runtime Protection** | No - Not applicable for this metric | | **Constants** | None - Uses dynamic evaluation | | **Usage Context** | Agentic workflows, multi-step tasks, tool-using assistants | | **Value Type** | Confidence score (0.0 to 1.0) - Confidence that any one action has advanced | | **Input/Output Requirements** | Requires conversation context, user goals, and assistant responses | ## When to Use This Metric

When to Use This Metric

This metric shines when simple response quality metrics fall short, particularly for complex, multi-step interactions where progress toward goals matters more than individual response quality.

Agentic Workflows: When an AI agent must decide on actions and select appropriate tools.

Multi-step Tasks: When completing a user's request requires multiple steps or decisions.

Tool-using Assistants: When evaluating if the assistant used available tools effectively.

Customer Service Agents: Resolving user issues through multi-step problem-solving.

Task-Oriented Assistants: Completing specific actions like booking flights or processing orders.

Research Assistants: Gathering and synthesizing information across multiple sources.

Creative Assistants: Understanding and building upon user requests iteratively.

### Calculation method If the Action Advancement score is less than 100%, it means at least one evaluator determined the assistant failed to make progress on any user goal. Action Advancement is calculated by: Multiple evaluation requests are sent to an LLM evaluator to analyze the assistant's progress toward user goals. A specialized chain-of-thought prompt guides the model to evaluate whether the assistant made progress on user goals based on the metric's definition. Each evaluation analyzes the interaction and produces both a detailed explanation and a binary judgment (yes/no) on goal advancement. The final Action Advancement score is computed as the confidence score or probability that any one user ask is advanced. We display one of the generated explanations alongside the score, choosing one that aligns with the majority judgment. This metric requires multiple LLM calls to compute, which may impact usage and billing. ### Score Interpretation **Expected Score:** 1.0 (Excellent) - The assistant made clear progress toward the booking goal by gathering necessary information and providing options. ### What different scores mean * **0.0 - 0.3 (Poor):** The assistant completely failed to address the user's request or made no meaningful progress. Common causes include ignoring the user's question, providing irrelevant information, or failing to use available tools when needed. * **0.4 - 0.7 (Fair):** The assistant made some progress but didn't fully accomplish the user's goal. This might include partial answers, requesting clarification when not needed, or missing key aspects of the request. * **0.8 - 1.0 (Excellent):** The assistant successfully advanced the user's goal by providing complete answers, making appropriate requests for clarification, or confirming successful task completion. ## How to improve Action Advancement scores To improve Action Advancement scores, focus on ensuring your AI agents make meaningful progress toward user goals in every interaction. ### Common issues and solutions | Issue | Cause | Solution | | :---------------------------------- | :----------------------------------------------- | :----------------------------------------------------------------------------------------------------------- | | **Assistant ignores user requests** | Poor prompt engineering or context understanding | Improve system prompts to emphasize goal-oriented responses and ensure the assistant understands user intent | | **Incomplete responses** | Insufficient context or tool usage | Provide better context and ensure the assistant uses available tools effectively | | **Irrelevant information** | Lack of focus on user goals | Train the assistant to stay focused on the specific user request and avoid tangential information | | **No progress on multi-step tasks** | Poor task breakdown | Implement better task decomposition and ensure the assistant can handle complex, multi-step processes | ### Best practices for optimization * **Clear goal identification:** Ensure your assistant can identify and prioritize user goals * **Progressive disclosure:** Break complex tasks into manageable steps * **Tool integration:** Make sure the assistant effectively uses available tools and APIs * **Context awareness:** Maintain conversation context to build on previous interactions ## Comparison to other metrics | Property | Action Advancement | Instruction Adherence | Completeness | | :----------------------------- | :---------------------------------------- | :----------------------------------------------- | :------------------------------- | | **Metric Category** | Agentic Metrics | Response Quality | Response Quality | | **Use this metric for** | Evaluating goal progress in conversations | Measuring how well responses follow instructions | Assessing response completeness | | **Best for** | Multi-step tasks and agentic workflows | Single-turn instruction following | Ensuring comprehensive responses | | **LLM/Luna Support** | Yes | Yes | Yes | | **Protect Runtime Protection** | No | No | No | | **Value Type** | Percentage (0.0-1.0) | Percentage (0.0-1.0) | Percentage (0.0-1.0) | | **Limitations** | Requires conversation context | May not capture goal progress | Doesn't measure goal advancement | ## Best practices To effectively implement and optimize Action Advancement in your AI systems, consider these key practices: ### Track progress over time Monitor Action Advancement scores across different versions of your agent to ensure improvements in task completion capabilities. This helps you identify whether your optimizations are actually improving goal advancement. ### Analyze failure patterns When Action Advancement scores are low, examine the specific steps where agents fail to make progress to identify systematic issues. Look for patterns in where agents get stuck or fail to advance user goals. ### Combine with other metrics Use Action Advancement alongside other agentic metrics to get a comprehensive view of your assistant's effectiveness. This provides a more complete picture of your agent's performance beyond just goal advancement. ### Test edge cases Create evaluation datasets that include complex, multi-step tasks to thoroughly assess your agent's ability to advance user goals. This ensures your agent can handle challenging scenarios that require multiple steps. When optimizing for Action Advancement, ensure you're not sacrificing other important aspects like safety, factual accuracy, or user experience in pursuit of task completion. ## Performance Benchmarks We evaluated Action Advancement against human expert labels on an internal dataset of agentic conversation samples using top frontier models. | Model | F1 (True) | | :---------------------- | :-------: | | GPT-4.1 | 0.87 | | GPT-4.1-mini (judges=3) | 0.78 | | Claude Sonnet 4.5 | 0.89 | | Gemini 3 Flash | 0.85 | ### GPT-4.1 Classification Report Benchmarks based on internal evaluation dataset. Performance may vary by use case. ## Related Resources If you would like to dive deeper or start implementing Action Advancement, check out the following resources: ### Examples * [Action Advancement Examples](https://app.galileo.ai) - Log in and explore the "Action Advancement" Log Stream in the "Preset Metric Examples" Project to see this metric in action. ### How-to guides * [Agentic AI Basic Example](/how-to-guides/agentic-ai/basic-example) * [Creating Custom Metrics](/how-to-guides/metrics/create-local-metric/create-local-metric) ### Related Concepts * [Agentic Metrics Overview](/concepts/metrics/agentic/agentic-overview) * [Action Completion](/concepts/metrics/agentic/action-completion) * [Agent Efficiency](/concepts/metrics/agentic/agent-efficiency)