- Agent overlap: The agent speaks while the user is still speaking
- Premature agent barge-in: The agent begins its response before the user’s intent is complete
- User barge-in: The user speaks while the agent is still speaking
Interruption Detection at a glance
| Property | Description |
|---|---|
| Name | Interruption Detection |
| Category | Multimodal Quality |
| Metric Level | Session (List of trace inputs / outputs only) |
| LLM-as-a-judge Support | ✅ |
| Luna Support | ❌ |
| Protect Runtime Protection | ❌ |
| Value Type | Boolean |
Score interpretation
| Score | Label | Meaning |
|---|---|---|
| False | No Interruption | No turn-taking violations were detected in the session |
| True | Interruption Detected | At least one turn-taking violation was detected in the session |
- means no overlap or barge-in was detected.
- means at least one interruption event occurred (agent overlap, premature agent barge-in, or user barge-in).
When to use this metric
Example scenario
endpoint too aggressive
User: “I need help booking a flight to—”
Agent: “Sure, what dates are you traveling?”
Interpretation: The agent started speaking before the user completed their intent, so the session should be labeled .
Interruption patterns
Types of Interruptions
Agent overlap: The agent begins or continues speaking while the user is still speaking, causing simultaneous audio output from both parties.
Premature agent barge-in: The agent starts its response before the user has finished expressing their intent, truncating incomplete utterances.
User barge-in: The user speaks while the agent is still delivering its response, often indicating frustration or an overly long agent turn.
Inputs considered
Interruption Detection operates at the session level and is computed over the full list of trace inputs and outputs for that session. The evaluator examines:- The ordered sequence of speaker turns (agent and user) across all traces in the session
- Audio files of the user and assistant turns
Accuracy improves when trace inputs and outputs inlcude the user / assistant audio files alongside the transcripts.
Calculation method
Interruption Detection is computed through a multi-step process:Session aggregation
Aggregate the full list of trace inputs and outputs for a session and identify speaker turns (user vs. agent).
Overlap and barge-in detection
Detect whether the agent speaks while the user is speaking, the agent begins responding before user intent completes, or the user speaks while the agent is speaking. Where available, use timing/overlap metadata to confirm simultaneous
speech.
This metric is typically computed by prompting an LLM over the session trace (and any available timing metadata), which may require additional LLM calls to compute and can impact usage and billing.
Best practices
Keep the trace inputs and outputs clean
Ensure only the latest user query is the trace input (free of chat history), and the last LLM span’s output as the trace output
Include transcripts
Add the test versions of the trace inputs and outputs alongside the audio files.
Performance Benchmarks
We evaluated Interruption Detection against human expert labels on an internal dataset of varied samples using top frontier models.| Model | F1 (True) |
|---|---|
| GPT-audio | 0.69 |
| Gemini 3 Flash | 0.93 |
| Gemini 3 Pro | 0.94 |
Gemini 3 Flash Classification Report
Related Resources
If you would like to dive deeper or start implementing Interruption Detection, check out the following resources:Examples
- Interruption Detection Examples - Log in and explore the “Interruption Detection” Log Stream in the “Preset Metric Examples” Project to see this metric in action.