1Bose Corporation, USA 2Fraunhofer IDMT, Germany
Speech enhancement for voice pickup in hearables aims to improve the user’s voice by suppressing noise and interfering talkers, while maintaining own-voice quality. For single-channel methods, it is particularly challenging to distinguish the target from interfering talkers without additional context. In this paper, we compare two strategies to resolve this ambiguity: personalized speech enhancement (PSE), which uses enrollment utterances to represent the target, and auxiliary‑sensor speech enhancement (AS-SE), which uses in‑ear microphones as additional input. We evaluate the strategies on two public datasets, employing different auxiliary sensor arrays, to investigate their cross-dataset generalization. We propose training‑time augmentations to facilitate cross-dataset generalization of AS-SE systems. We also show that combining PSE and AS-SE (PAS-SE) provides complementary performance benefits, especially when enrollment speech is recorded with the in‑ear microphone. We further demonstrate that PAS-SE personalized with noisy in-ear enrollments maintains performance benefits over the AS-SE system.
Training: Vibravox dataset (training partition)
Evaluation: Oldenburg dataset (test partition)
| System | Noise only | Interferer only | Noise and Interferer |
|---|---|---|---|
| Noisy outer microphone | |||
| SE | |||
|
PSE, Outer mic enrollment |
|||
| AS-SE (D) | |||
|
PAS-SE (D), In-ear mic enrollment |
|||
|
PAS-SE (OL), In-ear mic enrollment (trained in-domain) |
|||
| Clean target |
Training: Vibravox dataset (training partition)
Evaluation: Vibravox dataset (test partition)
| System | Noise only | Interferer only | Noise and Interferer |
|---|---|---|---|
| Noisy outer microphone | |||
| SE | |||
|
PSE, Outer mic enrollment |
|||
| AS-SE (D) |
N/A Vibravox dataset does not contain isolated recordings of interfering talkers |
N/A Vibravox dataset does not contain isolated recordings of interfering talkers |
|
|
PAS-SE (D), In-ear mic enrollment |
N/A Vibravox dataset does not contain isolated recordings of interfering talkers |
N/A Vibravox dataset does not contain isolated recordings of interfering talkers |
|
| Clean target |