Papers
arxiv:2511.11473

Proactive Hearing Assistants that Isolate Egocentric Conversations

Published on Nov 14
· Submitted by Guilin Hu on Nov 19
Authors:
,
,

Abstract

A proactive hearing assistant system identifies and separates conversation partners in real-time using a dual-model architecture on binaural audio, adapting to conversational dynamics without explicit prompts.

AI-generated summary

We introduce proactive hearing assistants that automatically identify and separate the wearer's conversation partners, without requiring explicit prompts. Our system operates on egocentric binaural audio and uses the wearer's self-speech as an anchor, leveraging turn-taking behavior and dialogue dynamics to infer conversational partners and suppress others. To enable real-time, on-device operation, we propose a dual-model architecture: a lightweight streaming model runs every 12.5 ms for low-latency extraction of the conversation partners, while a slower model runs less frequently to capture longer-range conversational dynamics. Results on real-world 2- and 3-speaker conversation test sets, collected with binaural egocentric hardware from 11 participants totaling 6.8 hours, show generalization in identifying and isolating conversational partners in multi-conversation settings. Our work marks a step toward hearing assistants that adapt proactively to conversational dynamics and engagement. More information can be found on our website: https://proactivehearing.cs.washington.edu/

Community

Paper author Paper submitter
edited 1 day ago

In this work, we introduce the first proactive hearing assistant that automatically identifies and isolates the speakers who are in the same conversation as the user, without requiring any user input or manual selection. Operating directly on egocentric binaural audio, the system uses the wearer’s own speech and natural turn-taking patterns to track who they are talking to and suppress interfering speakers in other conversations. A dual-model architecture enables real-time, on-device performance: a lightweight streaming model ensures low-latency, while a slower model captures longer conversational dynamics. Across real-world 2- and 3-speaker conversations (6.8 hours, 11 participants), our method shows generalization in identifying and isolating conversational partners in multi-conversation settings. Our work marks a step toward hearing assistants that adapt proactively to conversational dynamics and engagement.

Chat is this real 😭😭😭😭

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2511.11473 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.