"Classical Estimation versus Machine Learning for Pitch Estimation"
Classical Estimation versus Machine Learning for Pitch Estimation
Sprache des Titels:
Estimating the pitch of a periodic signal, also referred to as fundamental frequency, plays an important role in many signal processing applications, ranging from audio- and speech-signal processing to industrial applications and many more. The maximum likelihood (ML) pitch estimator is of great practical importance since it is optimal for additive white Gaussian noise (AWGN) in the sense that it is unbiased, and its variance approximately attains the Cramér-Rao lower bound (CRLB) for large data records. It also allows for a low-complex fast Fourier transform (FFT)-based implementation, subject to some mild restrictions that are often met in practice. Recently, a lot of work has been done on pitch estimation, tracking and detection using machine learning approaches. While most of these works have been developed for specific tasks such as pitch tracking in music signals or heart rate estimation, very little effort has been made to compare these new approaches with classical benchmarks such as the CRLB and the accuracy and computational complexity of the ML pitch estimator.
In this work, we describe the classical ML pitch estimator and derive its accuracy for the case where windowed data is used before applying the estimator. We also train several neural networks, using both existing and new architectures, on simulated data and compare their accuracy and computational complexity with the classical ML pitch estimator. And we provide a detailed description of the
insights gained from training the neural networks, some of which may be useful for other estimation problems using neural networks.