Frame-level Audio Similarity - A Codebook Approach.
Sprache des Titels:
Proceedings of the 11th International Conference on Digital Audio Effects (DAFx 2008)
Modeling audio signals by the long-term statistical distribution
of their local spectral features - often denoted as bag of frames
approach (BOF) - is a popular and powerful method to describe
audio content. While modeling the distribution of local spectral
features by semi-parametric distributions (e.g. Gaussian Mixture
Models) has been studied intensively, we investigate a non-parametric
variant based on vector quantization (VQ) in this paper.
The essential advantage of the proposed VQ approach over stateof-
the-art similarity measures is that the proposed audio similarity
metric forms a normed vector space. This allows for more powerful
search strategies, e.g. KD-Trees or Local Sensitive Hashing
(LSH), making content-based audio similarity available for
even larger music archives. Standard VQ approaches are known
to be computationally very expensive; to counter this problem,
we propose a multi-level clustering architecture. Additionally, we
show that the multi-level vector quantization approach (ML-VQ),
in contrast to standard VQ approaches, is comparable to state-ofthe-
art frame-level similarity measures in terms of quality. Another
important finding w.r.t. the ML-VQ approach is that, in contrast
to GMM models of songs, our approach does not seem to
suffer from the recently discovered hub problem.