arxiv:2511.11473

Proactive Hearing Assistants that Isolate Egocentric Conversations

Published on Nov 14

· Submitted by

Guilin Hu on Nov 19

University of Washington

Upvote

Authors:

Guilin Hu ,

Abstract

A proactive hearing assistant system identifies and separates conversation partners in real-time using a dual-model architecture on binaural audio, adapting to conversational dynamics without explicit prompts.

AI-generated summary

We introduce proactive hearing assistants that automatically identify and separate the wearer's conversation partners, without requiring explicit prompts. Our system operates on egocentric binaural audio and uses the wearer's self-speech as an anchor, leveraging turn-taking behavior and dialogue dynamics to infer conversational partners and suppress others. To enable real-time, on-device operation, we propose a dual-model architecture: a lightweight streaming model runs every 12.5 ms for low-latency extraction of the conversation partners, while a slower model runs less frequently to capture longer-range conversational dynamics. Results on real-world 2- and 3-speaker conversation test sets, collected with binaural egocentric hardware from 11 participants totaling 6.8 hours, show generalization in identifying and isolating conversational partners in multi-conversation settings. Our work marks a step toward hearing assistants that adapt proactively to conversational dynamics and engagement. More information can be found on our website: https://proactivehearing.cs.washington.edu/

View arXiv page View PDF Project page GitHub 3 Add to collection

Community

guilinhu

Paper author Paper submitter 1 day ago

•

edited 1 day ago

In this work, we introduce the first proactive hearing assistant that automatically identifies and isolates the speakers who are in the same conversation as the user, without requiring any user input or manual selection. Operating directly on egocentric binaural audio, the system uses the wearer’s own speech and natural turn-taking patterns to track who they are talking to and suppress interfering speakers in other conversations. A dual-model architecture enables real-time, on-device performance: a lightweight streaming model ensures low-latency, while a slower model captures longer conversational dynamics. Across real-world 2- and 3-speaker conversations (6.8 hours, 11 participants), our method shows generalization in identifying and isolating conversational partners in multi-conversation settings. Our work marks a step toward hearing assistants that adapt proactively to conversational dynamics and engagement.