Interruption Detection

Interruption Detection evaluates the full list of trace inputs and outputs in a session to determine whether any turn-taking violations occurred. It covers three interruption patterns:

Agent overlap: The agent speaks while the user is still speaking
Premature agent barge-in: The agent begins its response before the user’s intent is complete
User barge-in: The user speaks while the agent is still speaking

Interruption Detection at a glance

Property	Description
Name	Interruption Detection
Category	Multimodal Quality
Metric Level	Session (List of trace inputs / outputs only)
LLM-as-a-judge Support	✅
Luna Support	❌
Protect Runtime Protection	❌
Value Type	Boolean

Score interpretation

Score	Label	Meaning
False	No Interruption	No turn-taking violations were detected in the session
True	Interruption Detected	At least one turn-taking violation was detected in the session

Use this score as a single session-level signal:

means no overlap or barge-in was detected.
means at least one interruption event occurred (agent overlap, premature agent barge-in, or user barge-in).

When to use this metric

Example scenario

endpoint too aggressive

User: “I need help booking a flight to—”

Agent: “Sure, what dates are you traveling?”

Interpretation: The agent started speaking before the user completed their intent, so the session should be labeled .

Interruption patterns

Types of Interruptions

Agent overlap: The agent begins or continues speaking while the user is still speaking, causing simultaneous audio output from both parties.

Premature agent barge-in: The agent starts its response before the user has finished expressing their intent, truncating incomplete utterances.

User barge-in: The user speaks while the agent is still delivering its response, often indicating frustration or an overly long agent turn.

Inputs considered

Interruption Detection operates at the session level and is computed over the full list of trace inputs and outputs for that session. The evaluator examines:

The ordered sequence of speaker turns (agent and user) across all traces in the session
Audio files of the user and assistant turns

Accuracy improves when trace inputs and outputs inlcude the user / assistant audio files alongside the transcripts.

Calculation method

Interruption Detection is computed through a multi-step process:

Session aggregation

Aggregate the full list of trace inputs and outputs for a session and identify speaker turns (user vs. agent).

Overlap and barge-in detection

Detect whether the agent speaks while the user is speaking, the agent begins responding before user intent completes, or the user speaks while the agent is speaking. Where available, use timing/overlap metadata to confirm simultaneous speech.

Binary decision

Return if any interruption event occurs in the session; otherwise return .

This metric is typically computed by prompting an LLM over the session trace (and any available timing metadata), which may require additional LLM calls to compute and can impact usage and billing.

Best practices

Keep the trace inputs and outputs clean

Ensure only the latest user query is the trace input (free of chat history), and the last LLM span’s output as the trace output

Include transcripts

Add the test versions of the trace inputs and outputs alongside the audio files.

Performance Benchmarks

We evaluated Interruption Detection against human expert labels on an internal dataset of varied samples using top frontier models.

Model	F1 (True)
GPT-audio	0.69
Gemini 3 Flash	0.93
Gemini 3 Pro	0.94

Gemini 3 Flash Classification Report

If you would like to dive deeper or start implementing Interruption Detection, check out the following resources:

Examples

Interruption Detection Examples - Log in and explore the “Interruption Detection” Log Stream in the “Preset Metric Examples” Project to see this metric in action.