Khaled Koutini, Hamid Eghbal-Zadeh, Matthias Dorfer, Gerhard Widmer,
"The Receptive Field as a Regularizer in Deep Convolutional Neural Networks for Acoustic Scene Classification"
: Proceedings of the 27th European Signal Processing Conference (EUSIPCO), 9-2019
Original Titel:
The Receptive Field as a Regularizer in Deep Convolutional Neural Networks for Acoustic Scene Classification
Sprache des Titels:
Englisch
Original Buchtitel:
Proceedings of the 27th European Signal Processing Conference (EUSIPCO)
Original Kurzfassung:
Convolutional Neural Networks (CNNs) have had
great success in many machine vision as well as machine
audition tasks. Many image recognition network architectures
have consequently been adapted for audio processing tasks.
However, despite some successes, the performance of many of
these did not translate from the image to the audio domain.
For example, very deep architectures such as ResNet [1] and
DenseNet [2], which significantly outperform VGG [3] in image
recognition, do not perform better in audio processing tasks
such as Acoustic Scene Classification (ASC). In this paper, we
investigate the reasons why such powerful architectures perform
worse in ASC compared to simpler models (e.g., VGG). To this
end, we analyse the receptive field (RF) of these CNNs and
demonstrate the importance of the RF to the generalization
capability of the models. Using our receptive field analysis,
we adapt both ResNet and DenseNet, achieving state-of-theart
performance and eventually outperforming the VGG-based
models. We introduce systematic ways of adapting the RF in
CNNs, and present results on three data sets that show how
changing the RF over the time and frequency dimensions affects
a model?s performance. Our experimental results show that very
small or very large RFs can cause performance degradation, but
deep models can be made to generalize well by carefully choosing
an appropriate RF size within a certain range.