LaSNet: An end-to-end network based on steering vector filter for sound source localization and separation
In this paper, we propose a novel time-domain end-to-end network (LaSNet) for solving the problem of multiple sound sources localization (SSL) and separation based on microphone array. The traditional time-frequency (T-F) signal representation is subject to various prior conditions and fails to sepa...
        Saved in:
      
    
          | Published in | Applied acoustics Vol. 212; p. 109562 | 
|---|---|
| Main Authors | , , , , , , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
            Elsevier Ltd
    
        01.09.2023
     | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 0003-682X 1872-910X  | 
| DOI | 10.1016/j.apacoust.2023.109562 | 
Cover
| Summary: | In this paper, we propose a novel time-domain end-to-end network (LaSNet) for solving the problem of multiple sound sources localization (SSL) and separation based on microphone array. The traditional time-frequency (T-F) signal representation is subject to various prior conditions and fails to separate the different sound signal components. Even the data-driven neural network does not develop an effectively integrated approach where localization and separation interplay to serve both challenge. To address the aforementioned issue, we propose a novel approach that involves the implementation of a Separation Driving Localization Network (SDLNet). This framework operates by extracting latent features from a separation network and subsequently employing them in the context of a localization network. Then we propose a simple multi-task network for both SSL and separation. Through the analysis of steering vector filter, we find that the localization and separation problems can be linked by the operation of pseudo-inverse (pinv). To facilitate a synergistic relationship between SSL and sound separation, while also enabling end-to-end network training, we develop a Pinv Module (PM). Fianlly, the Localization and Separation Network (LaSNet) structure of this paper is proposed. Inspired by the overlay mechanism of network, LasNet is extended to a multi-task and multi-layer network, in which separation task is divided into multiple subtasks. A fuzzy separation loss function is introduced for training multi-layer network. Numerical experiments demonstrate that the proposed method has a clearly better advantageous improvement than several well known models. LaSNet has greatly performance improvement in both separation and localization, and achieves at least 32% relative reduction in model size, compared with the baseline models.
•The conventional multi-task network-based speech separation and source localization does not address dependency between multiple tasks.•Localization and SeparationNetwork (LaSNet) (MMNet) can be able to alleviate dependency using steering vector.•A selectable channel with residual connection can reduce computational complexity of LaSNet.•At least 32% relative reduction in model size are achieved. | 
|---|---|
| ISSN: | 0003-682X 1872-910X  | 
| DOI: | 10.1016/j.apacoust.2023.109562 |