Mid-conversation tangent
A customer is halfway through a return flow with your agent. They've shared the order number, the item and reason for the return. They then pause to ask: "Wait, do you offer this in red?" Your Agent then promptly responds to this new query.
Three turns later the customer says, "Okay, back to my return." This is a point where the Agent might falter: it forgets the item and order number, and then defaults to asking the customer to start again.
This is a mid-conversation tangent, and it's a common failure mode that consistently breaks agents that look fine on the “happy path,” i.e. The agent gets a polite question about an unrelated topic, handles it competently, and then loses the thread of what the customer was actually doing.
Why this slips past most testing
This eludes most tests for two reasons: the first being structural: a prompt evaluation involves a single input and a single output.
It can't see a tangent because there's no flow to interrupt. Even ‘happy-path’ multi-turn tests don't catch it, as the user never goes off-topic in a typical ‘happy path.’ The failure only surfaces when someone deliberately breaks the flow and then comes back to it.
The second reason is practicality: it's tedious to test by hand. A few scenarios in, you stop noticing the failures you came to look for.
A 10-minute test in Voxli
Here’s a simple test that works whether you're testing a support agent, a sales SDR, or a checkout assistant.
1. Start from a real flow.
Pick a happy-path scenario your agent should handle, e.g. A return, a lead qualification, or a status check. The exact flow doesn't matter; what matters is that it's multi-turn and that the agent collects context along the way.
2. Write an instruction that drops in a tangent.
Here Voxli's AI tester plays the user, where you can add an instruction in plain English:
"You're returning a pair of running shoes. Order number is NS-28479, the issue is they're a size too small. Halfway through the conversation, ask the agent an unrelated question about gift wrapping. After it answers, return to the return. Don't volunteer information you've already given."
That last sentence is the load-bearing one. It forces the test to expose whether the agent remembers what it has, or just plays along with whatever the user says next.
3. Write assertions that check both halves.
Two things have to hold when running this test:
- The agent answered the tangent politely and didn't get stuck.
- The agent resumed the original task without re-collecting information it already had.
- ...
In Voxli's UI, add your assertions. Mark the tanget as a Blocker.
Polite tangent-handling is a bonus, but context retention is the key to an agent's reliability.
4. Run it ten times.
Multi-turn behavior is probabilistic. One pass proves nothing; you're sampling a distribution, whereas ten runs gives you a real read on how often the agent loses context under this specific stress.

Typical results from your first run
The first time we ran this test against a pilot agent, we found that one in three runs passed cleanly.
The most common failure: the agent re-asked for the order number after the tangent.
The second most common failure: it carried the tangent topic into the resumed turn ("Got it–is there anything else you'd like to know about gift wrapping?").
None of these seem problematic on a single turn, however, over a thousand customer conversations can become disastrous, especially when you find your CSAT scores start to drop without a real obvious reason.
The deeper question
Mid-conversation tangents are a proxy for a deeper question: does your agent have the memory for intent, or just memory of words?
Most agents handle the words fine, it's the intent that slips. A 10-minute Voxli test will tell you which one your agent is doing today, before a customer finds out for you.
Remember, test always.