"Learning to Pinpoint Singing Voice from Weakly Labeled Examples"
: Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR, 8-2016
Learning to Pinpoint Singing Voice from Weakly Labeled Examples
Sprache des Titels:
Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR
Building an instrument detector usually requires temporally accurate ground truth that is expensive to create.
However, song-wise information on the presence of instruments is often easily available. In this work, we investigate how well we can train a singing voice detection system merely from song-wise annotations of vocal
presence. Using convolutional neural networks, multipleinstance learning and saliency maps, we can not only detect singing voice in a test signal with a temporal accuracy
close to the state-of-the-art, but also localize the spectral
bins with precision and recall close to a recent source separation method. Our recipe may provide a basis for other
sequence labeling tasks, for improving source separation
or for inspecting neural networks trained on auditory spectrograms.