The Yahoo! Music Dataset and KDD-Cup '11
نویسندگان
چکیده
The theme of the the KDD cup 2011 challenge was to identify user tastes in music by leveraging the actual Yahoo! Music dataset. Two datasets were sampled for the raw data: The larger dataset contained 262,810,175 ratings of 624,961 music items by 1,000,990 users was created for Track1 and and a smaller dataset with 62,551,438 ratings of 296,111 music items by 249,012 was created for Track2. A distinctive feature of the datasets is that there are four types of musical items: tracks, albums, artists, and genres, forming a four level hierarchy. The challenge started on March 15, 2011 and ended on June 30, 2011 and attracted 2389 participants, 2100 of which were active by the end of the competition. The popularity of the challenge is related to the fact that learning a large scale recommender systems is a generic problem, highly relevant to the industry. In addition, The competition drew interest by introducing a number of scientific and technical challenges including dataset size, hierarchical structure of items, high resolution timestamps of ratings, and a nonconventional ranking-based task.
منابع مشابه
Based Prediction System for Recommendation : KDD Cup 2011 , Track 2
This paper describes a solution to the 2011 KDD Cup competition, Track2: discriminating between highly rated tracks and unrated tracks in a Yahoo! Music dataset. Our approach was to use supervised learning based on 65 features generated using various techniques such as collaborative filtering, SVD, and similarity scoring. During our modeling stage, we created a number of predictors including lo...
متن کاملCollaborative Filtering Ensemble
This paper provides the solution of the team “commendo” on the Track1 dataset of the KDD Cup 2011 Dror et al.. Yahoo Labs provides a snapshot of their music-rating database as dataset for the competition. We get approximately 260 million ratings from 1 million users on 600k items. Timestamp and taxonomy information are added to the ratings. The goal of the competition was to predict unknown rat...
متن کاملCollaborative Filtering Ensemble for Ranking
This paper provides the solution of the team “commendo” on the Track2 dataset of the KDD Cup 2011 Dror et al.. Yahoo Labs provides a snapshot of their music-rating database as dataset for the competition, consisting of approximately 62 million ratings from 250k users on 300k items. The dataset includes hierachical information about the items. The goal of the competition is to distinguish beteen...
متن کاملCOMP621U Project Proposal
People have remarkably diverse tastes in music, which reflect diversity in personalities, cultures and age groups. Recently Yahoo! Music offers a wealth of information and services related to many aspects of music, such as user ratings, which can be utilized to analyze the encoded information on how songs are grouped, which artists complement each other, and which songs users would like to list...
متن کاملFeature Engineering in User's Music Preference Prediction
The second track of this year’s KDD Cup asked contestants to separate a user’s highly rated songs from unrated songs for a large set of Yahoo! Music listeners. We cast this task as a binary classification problem and addressed it utilizing gradient boosted decision trees. We created a set of highly predictive features, each with a clear explanation. These features were grouped into five categor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012