Receptive-Field Regularized CNNs for Music Classification and Tagging
Sprache des Titels:
Convolutional Neural Networks (CNNs) have beensuccessfully used in various Music Information Retrieval (MIR)tasks, both as end-to-end models and as feature extractorsfor more complex systems. However, the MIR field is stilldominated by the classical VGG-based CNN architecture vari-ants, often in combination with more complex modules suchas attention, and/or techniques such as pre-training on largedatasets. Deeper models such as ResNet ? which surpassed VGGby a large margin in other domains ? are rarely used in MIR.One of the main reasons for this, as we will show, is the lackof generalization of deeper CNNs in the music domain.In this paper, we present a principled way to make deeparchitectures like ResNet competitive for music-related tasks,based on well-designed regularization strategies. In particular,we analyze the recently introducedReceptive-Field Regulariza-tionandShake-Shake, and show that they significantly improvethe generalization of deep CNNs on music-related tasks, andthat the resulting deep CNNs can outperform current morecomplex models such as CNNs augmented with pre-training andattention. We demonstrate this on two different MIR tasks andtwo corresponding datasets, thus offering our deep regularizedCNNs as a new baseline for these datasets, which can also beused as a feature-extracting module in future, more complexapproaches.