LLM spans with tools (Agentic)

Use this tutorial when your metric depends on the available tools, the chat history, and the assistant action or response. This is the standard pattern for agentic tool-use metrics.

Dataset schema

Required columns:

tools: the available tool definitions or tool context
input: the chat history or user context
output: the assistant action or response
label: the ground-truth class for the metric

Config shape

Set:

data_generation.metric.input_format: "span_with_tools"
data_generation.source_data.dataset.columns.features: ["tools", "input", "output"]
generation.context_examples: 1

Minimal end-to-end config

run_steps:
  - data_generation
  - training

pipeline_provider: "local"
metric_name: "custom"

data_generation:
  metric:
    name: "Tool Selection Quality"
    type: "binary"
    input_format: "span_with_tools"
    llmaj_source_prompt: "Determine whether the bot's tool selection decision follows proper guidelines given the chat history and available tools."
  source_data:
    dataset:
      source_type: "huggingface"
      huggingface:
        name: "tool-selection-quality-dataset"
  generation:
    context_examples: 1
  output:
    dataset:
      repo_name: "tool-selection-quality-training-dataset"

training:
  dataset:
    name: "tool-selection-quality-training-dataset"
  output:
    model_name: "tool-selection-quality-model"

LLM spans with RAG Retriever spans

⌘I

Overview

Get Started

Observability

Evaluation Metrics

AI Assistant

Luna Studio

Experiments

Agent Control

Annotations

Integrations

Security

References

LLM spans with tools (Agentic)

Dataset schema

Config shape

Minimal end-to-end config

​Dataset schema

​Config shape

​Minimal end-to-end config

Dataset schema

Config shape

Minimal end-to-end config