Engineering · field notes

Voice with vision: when screen-share changes the conversation.

Tegan engineering·May 12, 2026· 7 min read

The most expensive moment in any voice conversation isn't the question being asked — it's the customer saying "wait, where do I click?" A voice agent with no eyes guesses. A voice agent with eyes reads the live UI and points to the actual button.

For the last three months we've instrumented two cohorts of voice sessions through the Tegan platform: roughly 6,000 voice-only sessions and 6,000 voice + screen- share sessions across nine private-beta partners. Same Tegan persona, same docs, same playbook. The only difference is whether the user opted in to share their screen.

The headline gap

On the voice-only side, 23% of conversations resolved without escalation. On the voice + screen-share side, that number jumped to 61%. Same questions. Same docs. What changed is what Tegan could see when the customer said "I'm looking at the page and I don't see it."

With no screen, the agent guesses based on the docs. The docs say Settings → Security → Audit log → Export. The actual UI in front of the user might be running an older release where the audit log lives under Compliance, not Security. The customer follows the instructions, hits a dead end, gets frustrated, and the conversation goes to a human. With screen on, Tegan reads the live navigation a frame at a time and walks them to the actual button — wherever it ended up.

What the data showed beneath the headline

The gap is wider in three specific places:

Settings-heavy products. When the UI has more than ~50 distinct settings screens, voice-only guesses go wrong more often. Screen-share collapses that error rate by ~70%.
Recently shipped features.Anything released in the last 6 weeks is where docs lag UI. With vision, Tegan answers from the actual product; without, she answers from yesterday's docs.
Multi-step flows. Onboarding paths with 4+ steps — the kind where each step depends on the previous — benefit most. Voice-only sessions tend to break at step 3; voice + screen sessions complete.

What broke (honestly)

Two things didn't go the way we expected.

First, modal dialogs are still hard. When a confirmation dialog overlays the UI, Tegan has to recognize that the interaction shifted into the dialog. Half of our early failures were the agent giving instructions for the background screen the customer was no longer interacting with. We added an "is this modal" check that runs every other frame; it's better, but it's still where we lose the most points on quality scoring.

Second, fast-moving UIs — anything heavy with animation or live-updating numbers — confuse the frame-by-frame reading. We capped frame rate at 1Hz on purpose to keep cost and latency in check, but that means a UI that renames a button mid-conversation can throw the agent off. Customers who use Tegan on dashboards with rapid refresh see this most.

The implication for product teams

If your product has lots of settings, ships fast, or relies on multi-step flows, you have an outsized opportunity with screen-share voice. If your product is a single core surface (a chat box, a feed, a search bar), voice-only is likely enough.

The reason this matters: voice-only agents have been around for a while and most teams have decided they don't work well enough. Our read is that the missing piece was vision. Not because vision makes the voice better — but because it removes the worst failure mode (guessing about the UI) without asking the user to do anything different. They're already on the page they're asking about.

If you want to see this with your own data, the free trial gives you 30 minutes — plenty of time to run a voice-only and a voice + screen-share session on your own product and watch the difference.

Try it on your product

30 minutes free. No card.

Drop me into your product and see whether screen-share vision changes anything for your customers. Free trial is full feature access.

Start free trial

← Back to the blog