"Deep Learning for Event Detection, Sequence Labelling and Similarity Estimation in Music Signals"
Deep Learning for Event Detection, Sequence Labelling and Similarity Estimation in Music Signals
Sprache des Titels:
When listening to music, some humans can easily recognize which instruments play
at what time or when a new musical segment starts, but cannot describe exactly
how they do this. To automatically describe particular aspects of a music piece
? be it for an academic interest in emulating human perception, or for practical
applications ?, we can thus not directly replicate the steps taken by a human. We
can, however, exploit that humans can easily annotate examples, and optimize a
generic function to reproduce these annotations. In this thesis, I explore solving
different music perception tasks with deep learning, a recent branch of machine
learning that optimizes functions of many stacked nonlinear operations ? referred
to as deep neural networks ? and promises to obtain better results or require less
domain knowledge than more traditional techniques.
In particular, I employ fully-connected neural networks for music and speech
detection and to accelerate music similarity measures, and convolutional neural
networks for detecting note onsets, musical segment boundaries and singing voice.
In doing so, I evaluate both how well and in what way the networks solve the respective
tasks. Using the example of singing voice detection, I additionally develop
data augmentation methods to learn from only few annotated music pieces, and a
recipe to obtain temporally accurate predictions from inaccurate training examples.
The results of my work surpass the previous state of the art in all the tasks considered.