Upfront information dump

A customer opens your support agent with this:

"hi! the running shoes are too small (size 10 not 9), they were a gift from my mom can you refund MINE not her card? order NS-28479 thanks - Daisy"

Your LLM parses it fine, but the stack downstream doesn't.

The slot-filler captured order_id and issue but missed refund_target_card. A retrieval call fires before the agent has confirmed that the order is within the refund window, then two turns later it asks: "What is the order number?"

This is the upfront info dump: a failure mode that breaks agents that look fine on the “happy path.”

Why this slips past most testing

All prompt evals score one input and one output which usually handles the information dump fine, it's the surrounding orchestration where it falls apart: slots half-filled, intents mis-classified, tools firing before the agent has the full picture. This is where a prompt eval typically fails.

Consequently, happy-path multi-turn tests don't catch it either: a scripted happy path opens with the bare minimum a customer would say, then layers context turn by turn. However, in reality, users don't stick to that script; they might pose a question, or provide five fields of data and then contradict themselves on the second turn. Neither of those shows up in a happy-path test.

I have found as models get more advanced, it is often the stack surrounding the model that is the issue. Slot filling and over prompting the underlying LLM with poorly constructed context tends to cause these issues. Before LLMs, a common complaint was about customer service agents getting stuck in a loop of repeating the same questions.

Fixing the failure with a 10-minute test in Voxli

To catch upfront failures before they catch you out, run these tests in Voxli:

1. Set up a test in a scenario for the flow: Returns, lead qualification, intake. Anywhere the agent fills fields or calls tools.

2. Write an instruction that opens with the dump: Voxli's AI tester plays the user, here you give it a persona, a numbered script, and an end condition e.g. "You're Daisy Andersson. Order NS-28479, running shoes a size too small, refund to your own card (gift from your mom).”

  1. Open: With the above message. Write the way a real customer would: lowercase, run-on, not perfectly composed.
  2. If the agent asks for something you already gave, point it out. Don't just repeat it.
  3. End: Once the agent confirms the refund details, or makes it clear it lost track.

3. Provide assertions:

  • The agent never re-asks for a field the user already gave.
  • The downstream actions (slots filled, tools called, retrieval grounded) match what the user said.

Mark the first as Blocker. The second is the one most teams skip, and it's the one that matters: a slot landing wrong in a CRM record is worse than the agent asking twice.

4. Run it ten times. Multi-turn behavior is probabilistic: one pass tells you almost nothing.

Possible Test Outcomes

When this test fails, it usually fails in one of two ways:

  1. The agent processes the return but forgets the refund-target-card, or some other detail buried inside the dump.
  2. It fires a tool before it has the full picture, like running the order lookup before noticing the buyer was actually the mom, not Daisy.

The failure is usually noticed later when a customer emails asking why the refund went to her mom's card, or when an internal CRM record has half the fields wrong.

Testing before the next change

Upfront dumps stress the seams of an Agent: slot-filler, routing, tool selection, order of operations.

A failure is typically caused by the agent stack, not the model. We think that's the bigger story behind a lot of agent regressions right now: a modern model is usually the safest part of your stack, however it's the orchestration around it that trips you up.

The key is to run tests before you swap a model or update a prompt, and then run it again after. The difference is what you find has shifted.

And if you skip ‌everything else, run step 4 ten times.

And remember, test always.

CTA Image

Prevent AI Agent Failure Modes today. Switch to automated testing with Voxli.

Get started today