Skip to main content
Interruption Detection evaluates the full list of trace inputs and outputs in a session to determine whether any turn-taking violations occurred. It covers three interruption patterns:
  • Agent overlap: The agent speaks while the user is still speaking
  • Premature agent barge-in: The agent begins its response before the user’s intent is complete
  • User barge-in: The user speaks while the agent is still speaking

Interruption Detection at a glance

PropertyDescription
NameInterruption Detection
CategoryMultimodal Quality
Metric LevelSession (List of trace inputs / outputs only)
LLM-as-a-judge Support
Luna Support
Protect Runtime Protection
Value TypeBoolean

Score interpretation

ScoreLabelMeaning
FalseNo InterruptionNo turn-taking violations were detected in the session
TrueInterruption DetectedAt least one turn-taking violation was detected in the session
Use this score as a single session-level signal:
  • means no overlap or barge-in was detected.
  • means at least one interruption event occurred (agent overlap, premature agent barge-in, or user barge-in).

When to use this metric

Example scenario

endpoint too aggressive

User: “I need help booking a flight to—”
Agent: “Sure, what dates are you traveling?”
Interpretation: The agent started speaking before the user completed their intent, so the session should be labeled .

Interruption patterns

Types of Interruptions

Agent overlap: The agent begins or continues speaking while the user is still speaking, causing simultaneous audio output from both parties.
Premature agent barge-in: The agent starts its response before the user has finished expressing their intent, truncating incomplete utterances.
User barge-in: The user speaks while the agent is still delivering its response, often indicating frustration or an overly long agent turn.

Inputs considered

Interruption Detection operates at the session level and is computed over the full list of trace inputs and outputs for that session. The evaluator examines:
  • The ordered sequence of speaker turns (agent and user) across all traces in the session
  • Audio files of the user and assistant turns
Accuracy improves when trace inputs and outputs inlcude the user / assistant audio files alongside the transcripts.

Calculation method

Interruption Detection is computed through a multi-step process:
1

Session aggregation

Aggregate the full list of trace inputs and outputs for a session and identify speaker turns (user vs. agent).
2

Overlap and barge-in detection

Detect whether the agent speaks while the user is speaking, the agent begins responding before user intent completes, or the user speaks while the agent is speaking. Where available, use timing/overlap metadata to confirm simultaneous speech.
3

Binary decision

Return if any interruption event occurs in the session; otherwise return .
This metric is typically computed by prompting an LLM over the session trace (and any available timing metadata), which may require additional LLM calls to compute and can impact usage and billing.

Best practices

Keep the trace inputs and outputs clean

Ensure only the latest user query is the trace input (free of chat history), and the last LLM span’s output as the trace output

Include transcripts

Add the test versions of the trace inputs and outputs alongside the audio files.

Performance Benchmarks

We evaluated Interruption Detection against human expert labels on an internal dataset of varied samples using top frontier models.
ModelF1 (True)
GPT-audio0.69
Gemini 3 Flash0.93
Gemini 3 Pro0.94

Gemini 3 Flash Classification Report

If you would like to dive deeper or start implementing Interruption Detection, check out the following resources:

Examples

  • Interruption Detection Examples - Log in and explore the “Interruption Detection” Log Stream in the “Preset Metric Examples” Project to see this metric in action.