B. Lehner, Gerhard Widmer, Sebastian Böck,
"A Low-Latency, Real-Time-Capable Singing Voice Detection Method with LSTM Recurrent Neural Networks"
: Proceedings of the 23th European Signal Processing Conference (EUSIPCO 2015),, Seite(n) 21 - 25, 2015
A Low-Latency, Real-Time-Capable Singing Voice Detection Method with LSTM Recurrent Neural Networks
Sprache des Titels:
Proceedings of the 23th European Signal Processing Conference (EUSIPCO 2015),
Singing voice detection aims at identifying the regions in a
music recording where at least one person sings. This is a
challenging problem that cannot be solved without analysing
the temporal evolution of the signal. Current state-of-the-art
methods combine timbral with temporal characteristics, by
summarising various feature values over time, e.g. by computing
their variance. This leads to more contextual information,
but also to increased latency, which is problematic if our
goal is on-line, real-time singing voice detection.
To overcome this problem and reduce the necessity to
include context in the features themselves, we introduce a
method that uses Long Short-Term Memory Recurrent Neural
Networks (LSTM-RNN). In experiments on several data sets,
the resulting singing voice detector outperforms the state-ofthe-
art baselines in terms of accuracy, while at the same time
drastically reducing latency and increasing the time resolution
of the detector.