PAS-SE: Personalized Auxiliary-Sensor Speech Enhancement for Voice Pickup in Hearables

Mattes Ohlenbusch1,2, Mikolaj Kegler1, Marko Stamenovic1

1Bose Corporation, USA   2Fraunhofer IDMT, Germany  

Abstract

Speech enhancement for voice pickup in hearables aims to improve the user’s voice by suppressing noise and interfering talkers, while maintaining own-voice quality. For single-channel methods, it is particularly challenging to distinguish the target from interfering talkers without additional context. In this paper, we compare two strategies to resolve this ambiguity: personalized speech enhancement (PSE), which uses enrollment utterances to represent the target, and auxiliary‑sensor speech enhancement (AS-SE), which uses in‑ear microphones as additional input. We evaluate the strategies on two public datasets, employing different auxiliary sensor arrays, to investigate their cross-dataset generalization. We propose training‑time augmentations to facilitate cross-dataset generalization of AS-SE systems. We also show that combining PSE and AS-SE (PAS-SE) provides complementary performance benefits, especially when enrollment speech is recorded with the in‑ear microphone. We further demonstrate that PAS-SE personalized with noisy in-ear enrollments maintains performance benefits over the AS-SE system.

Links

arXiv preprint:
https://arxiv.org/abs/2509.20875
Code:
https://github.com/Bose/passe
Vibravox Dataset:
http://vibravox.cnam.fr/
Oldenburg Dataset:
https://doi.org/10.5281/zenodo.10844598 (own voice),
https://doi.org/10.5281/zenodo.11196866 (impulse responses)

Audio Examples Oldenburg (out-of-domain)

Training: Vibravox dataset (training partition)
Evaluation: Oldenburg dataset (test partition)

SystemNoise onlyInterferer onlyNoise and Interferer
Noisy outer microphone
SE
PSE,
Outer mic enrollment
AS-SE (D)
PAS-SE (D),
In-ear mic enrollment
PAS-SE (OL),
In-ear mic enrollment
(trained in-domain)
Clean target

Noisy outer microphone

Noise only
Interferer only
Noise and Interferer

SE

Noise only
Interferer only
Noise and Interferer

PSE, Outer mic enrollment

Noise only
Interferer only
Noise and Interferer

AS-SE (D)

Noise only
Interferer only
Noise and Interferer

PAS-SE (D), In-ear mic enrollment

Noise only
Interferer only
Noise and Interferer

Clean target

Noise only
Interferer only
Noise and Interferer

Audio Examples Vibravox (in-domain)

Training: Vibravox dataset (training partition)
Evaluation: Vibravox dataset (test partition)

SystemNoise onlyInterferer onlyNoise and Interferer
Noisy outer microphone
SE
PSE,
Outer mic enrollment
AS-SE (D) N/A
Vibravox dataset does not contain isolated recordings of interfering talkers
N/A
Vibravox dataset does not contain isolated recordings of interfering talkers
PAS-SE (D),
In-ear mic enrollment
N/A
Vibravox dataset does not contain isolated recordings of interfering talkers
N/A
Vibravox dataset does not contain isolated recordings of interfering talkers
Clean target