Abstract
A proactive hearing assistant system identifies and separates conversation partners in real-time using a dual-model architecture on binaural audio, adapting to conversational dynamics without explicit prompts.
We introduce proactive hearing assistants that automatically identify and separate the wearer's conversation partners, without requiring explicit prompts. Our system operates on egocentric binaural audio and uses the wearer's self-speech as an anchor, leveraging turn-taking behavior and dialogue dynamics to infer conversational partners and suppress others. To enable real-time, on-device operation, we propose a dual-model architecture: a lightweight streaming model runs every 12.5 ms for low-latency extraction of the conversation partners, while a slower model runs less frequently to capture longer-range conversational dynamics. Results on real-world 2- and 3-speaker conversation test sets, collected with binaural egocentric hardware from 11 participants totaling 6.8 hours, show generalization in identifying and isolating conversational partners in multi-conversation settings. Our work marks a step toward hearing assistants that adapt proactively to conversational dynamics and engagement. More information can be found on our website: https://proactivehearing.cs.washington.edu/
Community
In this work, we introduce the first proactive hearing assistant that automatically identifies and isolates the speakers who are in the same conversation as the user, without requiring any user input or manual selection. Operating directly on egocentric binaural audio, the system uses the wearer’s own speech and natural turn-taking patterns to track who they are talking to and suppress interfering speakers in other conversations. A dual-model architecture enables real-time, on-device performance: a lightweight streaming model ensures low-latency, while a slower model captures longer conversational dynamics. Across real-world 2- and 3-speaker conversations (6.8 hours, 11 participants), our method shows generalization in identifying and isolating conversational partners in multi-conversation settings. Our work marks a step toward hearing assistants that adapt proactively to conversational dynamics and engagement.
Chat is this real 😭😭😭😭
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- AV-Dialog: Spoken Dialogue Models with Audio-Visual Input (2025)
- ConvFill: Model Collaboration for Responsive Conversational Voice Agents (2025)
- DialoSpeech: Dual-Speaker Dialogue Generation with LLM and Flow Matching (2025)
- FLEXI: Benchmarking Full-duplex Human-LLM Speech Interaction (2025)
- LibriConvo: Simulating Conversations from Read Literature for ASR and Diarization (2025)
- EgoSocial: Benchmarking Proactive Intervention Ability of Omnimodal LLMs via Egocentric Social Interaction Perception (2025)
- EvolveCaptions: Empowering DHH Users Through Real-Time Collaborative Captioning (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper