Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), 2017.
String-based (or viewpoint) models of tonal harmony often
struggle with data sparsity in pattern discovery and prediction
tasks, particularly when modeling composite events
like triads and seventh chords, since the number of distinct
n-note combinations in polyphonic textures is potentially
enormous. To address this problem, this study examines
the efficacy of skip-grams in music research, an alternative
viewpoint method developed in corpus linguistics and natural
language processing that includes sub-sequences of n
events (or n-grams) in a frequency distribution if their constituent
members occur within a certain number of skips.
Using a corpus consisting of four datasets of Western
classical music in symbolic form, we found that including
skip-grams reduces data sparsity in n-gram distributions
by (1) minimizing the proportion of n-grams with negligible
counts, and (2) increasing the coverage of contiguous
n-grams in a test corpus. What is more, skip-grams significantly
outperformed contiguous n-grams in discovering
conventional closing progressions (called cadences).