Convolutional fusion network for monaural speech enhancement.

Xian Y.; Sun Y.; Wang W.; Naqvi SM.

Convolutional fusion network for monaural speech enhancement.

Xian Y., Sun Y., Wang W., Naqvi SM.

Convolutional neural network (CNN) based methods, such as the convolutional encoder-decoder network, offer state-of-the-art results in monaural speech enhancement. In the conventional encoder-decoder network, large kernel size is often used to enhance the model capacity, which, however, results in low parameter efficiency. This could be addressed by using group convolution, as in AlexNet, where group convolutions are performed in parallel in each layer, before their outputs are concatenated. However, with the simple concatenation, the inter-channel dependency information may be lost. To address this, the Shuffle network re-arranges the outputs of each group before concatenating them, by taking part of the whole input sequence as the input to each group of convolution. In this work, we propose a new convolutional fusion network (CFN) for monaural speech enhancement by improving model performance, inter-channel dependency, information reuse and parameter efficiency. First, a new group convolutional fusion unit (GCFU) consisting of the standard and depth-wise separable CNN is used to reconstruct the signal. Second, the whole input sequence (full information) is fed simultaneously to two convolution networks in parallel, and their outputs are re-arranged (shuffled) and then concatenated, in order to exploit the inter-channel dependency within the network. Third, the intra skip connection mechanism is used to connect different layers inside the encoder as well as decoder to further improve the model performance. Extensive experiments are performed to show the improved performance of the proposed method as compared with three recent baseline methods.

Original publication

DOI

10.1016/j.neunet.2021.05.017

Type

Journal article

Journal

Neural networks : the official journal of the International Neural Network Society

Publication Date

11/2021

Volume

143

Pages

97 - 107

Addresses

Intelligent Sensing and Communications Research Group, School of Engineering, Newcastle University, Newcastle upon, Tyne NE1 7RU, UK; College of Computer and Communication Engineering, ZhengZhou University of Light Industry, Zhengzhou, China. Electronic address: Y.xian2@newcastle.ac.uk.

Keywords

Speech, Neural Networks, Computer

Cookies on this website

Convolutional fusion network for monaural speech enhancement.

Xian Y., Sun Y., Wang W., Naqvi SM.

DOI

Type

Journal

Publication Date

Volume

Pages

Addresses

Keywords