PAS-SE: Personalized Auxiliary-Sensor Speech Enhancement for Voice Pickup in Hearables

Mattes Ohlenbusch^1,2, Mikolaj Kegler¹, Marko Stamenovic¹

¹Bose Corporation, USA ²Fraunhofer IDMT, Germany

Abstract

Speech enhancement for voice pickup in hearables aims to improve the user’s voice by suppressing noise and interfering talkers, while maintaining own-voice quality. For single-channel methods, it is particularly challenging to distinguish the target from interfering talkers without additional context. In this paper, we compare two strategies to resolve this ambiguity: personalized speech enhancement (PSE), which uses enrollment utterances to represent the target, and auxiliary‑sensor speech enhancement (AS-SE), which uses in‑ear microphones as additional input. We evaluate the strategies on two public datasets, employing different auxiliary sensor arrays, to investigate their cross-dataset generalization. We propose training‑time augmentations to facilitate cross-dataset generalization of AS-SE systems. We also show that combining PSE and AS-SE (PAS-SE) provides complementary performance benefits, especially when enrollment speech is recorded with the in‑ear microphone. We further demonstrate that PAS-SE personalized with noisy in-ear enrollments maintains performance benefits over the AS-SE system.

Links

Audio Examples Oldenburg (out-of-domain)

Training: Vibravox dataset (training partition)
Evaluation: Oldenburg dataset (test partition)

Audio Examples Vibravox (in-domain)

Training: Vibravox dataset (training partition)
Evaluation: Vibravox dataset (test partition)

System	Noise only	Interferer only	Noise and Interferer
Noisy outer microphone
SE
PSE, Outer mic enrollment
AS-SE (D)
PAS-SE (D), In-ear mic enrollment
PAS-SE (OL), In-ear mic enrollment (trained in-domain)
Clean target

System	Interferer only	Noise and Interferer
Noisy outer microphone
SE
PSE, Outer mic enrollment
AS-SE (D)	N/A Vibravox dataset does not contain isolated recordings of interfering talkers	N/A Vibravox dataset does not contain isolated recordings of interfering talkers
PAS-SE (D), In-ear mic enrollment	N/A Vibravox dataset does not contain isolated recordings of interfering talkers	N/A Vibravox dataset does not contain isolated recordings of interfering talkers
Clean target

PAS-SE: Personalized Auxiliary-Sensor Speech Enhancement for Voice Pickup in Hearables

Mattes Ohlenbusch1,2, Mikolaj Kegler1, Marko Stamenovic1

Abstract

Links

Audio Examples Oldenburg (out-of-domain)

Noisy outer microphone

Noise only

Interferer only

Noise and Interferer

SE

Noise only

Interferer only

Noise and Interferer

PSE, Outer mic enrollment

Noise only

Interferer only

Noise and Interferer

AS-SE (D)

Noise only

Interferer only

Noise and Interferer

PAS-SE (D), In-ear mic enrollment

Noise only

Interferer only

Noise and Interferer

Clean target

Noise only

Interferer only

Noise and Interferer

Audio Examples Vibravox (in-domain)

Mattes Ohlenbusch^1,2, Mikolaj Kegler¹, Marko Stamenovic¹