This year's tasks employ the Netflix Prize training data set. This data set consists of more than 100 million ratings from over 480 thousand randomly-chosen, anonymous customers on nearly 18 thousand movie titles. The data were collected between October, 1998 and December, 2005 and reflect the distribution of all ratings received by Netflix during this period. The ratings are on a scale from 1 to 5 (integral) stars. (See below for details on downloading this data set.)
This year's competition consists of two tasks. Each team can participate in the competition of any one task or both tasks.
- Task 1 (Who Rated What in 2006): Your task is to predict which users rated which movies in 2006. We will provide a list of 100,000 (user_id, movie_id) pairs where the users and movies are drawn
from the Netflix Prize training data set. None of the pairs were rated in the training set. Your task is to predict the probability that each pair was rated in 2006 (i.e., the probability that user_id rated movie_id in 2006). (The actual rating is irrelevant; we just want whether the movie was rated by that user sometime in 2006. The date in2006 when the rating was given is also irrelevant.)
- Task 2 (How Many Ratings in 2006): Your task is to predict the number of additional ratings the users from the Netflix Prize training dataset gave to a subset of the movies in the training dataset. We provide a list of 8863 movie_ids drawn from the Netflix Prize training dataset. You need to predict the number of additional ratings that all users in the Netflix Prize training dataset provided in 2006 for each of those movie titles. (Again the actual rating given by each user is irrelevant; we just want the number of times that the movie was rated in 2006. The date in 2006 when the rating was given is also irrelevant.)
More information regarding the competition is available here.