Monaural Source Separation in Complex Domain With Long Short-Term Memory Neural Network

Sun Y., Xian Y., Wang W., Naqvi SM.

In recent research, deep neural network (DNN) has been used to solve the monaural source separation problem. According to the training objectives, DNN-based monaural speech separation is categorized into three aspects, namely masking, mapping, and signal approximation based techniques. However, the performance of the traditional methods is not robust due to variations in real-world environments. Besides, in the vanilla DNN-based methods, the temporal information cannot be fully utilized. Therefore, in this paper, the long short-term memory (LSTM) neural network is applied to exploit the long-term speech contexts. Then, we propose the complex signal approximation (cSA), which is operated in the complex domain to utilize the phase information of the desired speech signal to improve the separation performance. The IEEE and the TIMIT corpora are used to generate mixtures with noise and speech interferences to evaluate the efficacy of the proposed method. The experimental results demonstrate the advantages of the proposed cSA-based LSTM recurrent neural network method in terms of different objective performance measures.

DOI

10.1109/JSTSP.2019.2908760

Type

Journal article

Journal

IEEE Journal on Selected Topics in Signal Processing

Publication Date

01/05/2019

Volume

13

Pages

359 - 369

Permalink Original publication