LFM-2b: A Dataset of Enriched Music Listening > Events for Recommender Systems Research and Fairness Analysis
Sprache des Titels:
Proceedings of the 7th ACM SIGIR Conference on Human Information > Interaction and Retrieval (CHIIR 2022)
We present the LFM-2b dataset containing the listening records of over 120,000 users of the music platform Last.fm. These users
provide a total of more than two billion individual listening events that span a time range of over 15 years, from February 2005 until
March 2020. These listening events refer to a total of 50 million distinct tracks of 5 million distinct artists. Beside the common
metadata (i. e., artist and track name), LFM-2b contains additional information both regarding the users and items. This includes the
demographic information of users, namely country, gender, and age, and the fine-grained genre and style of items together with the
vector embeddings of their lyrics. LFM-2b is a rich dataset that enables research on a variety of
recommender system algorithms, such as the ones based on collaborative filtering (e.g., leveraging the user?item interactions in the
form of listening events), but also content-based approaches (e.g., exploiting genres and lyrics), or hybrid combinations thereof. Users?
demographic information furthermore enable experimentation on identifying and mitigating various data and algorithmic biases of
recommender systems, and investigating fairness aspects of such systems, e.g., according to gender.