LaSNet: An end-to-end network based on steering vector filter for sound source localization and separation

In this paper, we propose a novel time-domain end-to-end network (LaSNet) for solving the problem of multiple sound sources localization (SSL) and separation based on microphone array. The traditional time-frequency (T-F) signal representation is subject to various prior conditions and fails to sepa...

Full description

Saved in:
Bibliographic Details
Published inApplied acoustics Vol. 212; p. 109562
Main Authors Yang, Xiaokang, Zhang, Hongcheng, Lu, Yufei, A, Ying, Ren, Guangyi, Wei, Jianguo, Wang, Xianliang, Li, Wei
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.09.2023
Subjects
Online AccessGet full text
ISSN0003-682X
1872-910X
DOI10.1016/j.apacoust.2023.109562

Cover

More Information
Summary:In this paper, we propose a novel time-domain end-to-end network (LaSNet) for solving the problem of multiple sound sources localization (SSL) and separation based on microphone array. The traditional time-frequency (T-F) signal representation is subject to various prior conditions and fails to separate the different sound signal components. Even the data-driven neural network does not develop an effectively integrated approach where localization and separation interplay to serve both challenge. To address the aforementioned issue, we propose a novel approach that involves the implementation of a Separation Driving Localization Network (SDLNet). This framework operates by extracting latent features from a separation network and subsequently employing them in the context of a localization network. Then we propose a simple multi-task network for both SSL and separation. Through the analysis of steering vector filter, we find that the localization and separation problems can be linked by the operation of pseudo-inverse (pinv). To facilitate a synergistic relationship between SSL and sound separation, while also enabling end-to-end network training, we develop a Pinv Module (PM). Fianlly, the Localization and Separation Network (LaSNet) structure of this paper is proposed. Inspired by the overlay mechanism of network, LasNet is extended to a multi-task and multi-layer network, in which separation task is divided into multiple subtasks. A fuzzy separation loss function is introduced for training multi-layer network. Numerical experiments demonstrate that the proposed method has a clearly better advantageous improvement than several well known models. LaSNet has greatly performance improvement in both separation and localization, and achieves at least 32% relative reduction in model size, compared with the baseline models. •The conventional multi-task network-based speech separation and source localization does not address dependency between multiple tasks.•Localization and SeparationNetwork (LaSNet) (MMNet) can be able to alleviate dependency using steering vector.•A selectable channel with residual connection can reduce computational complexity of LaSNet.•At least 32% relative reduction in model size are achieved.
ISSN:0003-682X
1872-910X
DOI:10.1016/j.apacoust.2023.109562