The NetFlix Prize

Η NetFlix είναι μία καινοτομική εταιρεία ενοικίασης ταινιών από την απέναντι όχθη του Ατλαντικού. Σε αυτήν μπορεί κανείς, με σταθερό μηνιαίο κόστος, να παρακολουθήσει ταινίες τις αρεσκείας του κατευθείαν στον υπολογιστή του, δείτε και το σχετικό σχήμα.

Με εκατομμύρια πελάτες και τίτλους ταινιών διαθέσιμους, η NetFlix διαθέτει πλείστα ratings ταινιών (1-5 αστέρια), ωστόσο υψηλής σημασίας για αυτήν είναι η πρόβλεψη της βαθμολογίας που θα καταχωρήσει ένας χρήστης σε μία συγκεκριμένη ταινία. Οι χρήσεις για κάτι τέτοιο είναι αρκετές, για παράδειγμα η σύσταση ενός συνόλου ταινιών σε χρήστη με δεδομένο ‘ιστορικό παρακολούθησης’, για μεγιστοποίηση του αριθμού εκείνων που θα επιλέξουν να παρακολουθήσουν τις προτεινόμενες ταινίες. Και, αν μη τι άλλο έχουν ιδιαίτερη αξία. Το τελευταίο αποδεικνύεται από την ενδιαφέρουσα επιλογή της NetFlix, να ανοίξει το πρόβλημα προς το διεθνές ερευνητικό κοινό και, παρέχοντας σχετικό dataset να ζητήσει το βέλτιστο αλγόριθμο και τις προβλέψεις αυτού. Το όνομα του διαγωνισμού: The NetFlix Prize.

Το κίνητρο είναι αρκετά σημαντικό, ο διαγωνισμός είναι ανοικτός σε όλους και να είστε σίγουροι πως μέσα από το μάθημα έχετε αποκομίσει τις αναγκαίες γνώσεις για να συμμετέχετε. Παραθέτω λοιπόν λεπτομέρειες για τους πιθανά ενδιαφερόμενους:


We’re quite curious, really. To the tune of one million dollars.


  • Contest begins October 2, 2006 and continues through at least October 2, 2011.
  • Contest is open to anyone, anywhere (except certain countries listed below).
  • You have to register to enter.
  • Once you register and agree to these Rules, you’ll have access to the Contest training data and qualifying test sets.
  • To qualify for the $1,000,000 Grand Prize, the accuracy of your submitted predictions on the qualifying set must be at least 10% better than the accuracy Cinematch can achieve on the same training data set at the start of the Contest.
  • To qualify for a year’s $50,000 Progress Prize the accuracy of any of your submitted predictions that year must be less than or equal to the accuracy value established by the judges the preceding year.
  • To win and take home either prize, your qualifying submissions must have the largest accuracy improvement verified by the Contest judges, you must share your method with (and non-exclusively license it to) Netflix, and you must describe to the world how you did it and why it works.
Upon registration, Participants may access the Contest training data and qualifying test sets. The training data set consists of more than 100 million ratings from over 480 thousand randomly-chosen, anonymous customers on nearly 18 thousand movie titles. The data were collected between October, 1998 and December, 2005 and reflect the distribution of all ratings received by Netflix during this period. The ratings are on a scale from 1 to 5 (integral) stars.
In addition to the training data set, a qualifying test set is provided containing over 2.8 million customer/movie id pairs with rating dates but with the ratings withheld. These pairs were selected from the most recent ratings from a subset of the same customers in the training data set, over a subset of the same movies.

The qualifying set is divided into two disjoint subsets containing randomly selected pairs from the qualifying set. The assignment of pairs to these subsets is not disclosed. The Site will score each subset by computing the square root of the averaged squared difference between each prediction and the actual rating (the root mean squared error or “RMSE”) in the subset, rounded to the nearest .0001.

The RMSE for the first “quiz” subset will be reported publicly on the Site; the RMSE for the second “test” subset will not be reported publicly but will be employed to qualify a submission as described below. The reported RMSE scores on the quiz subset provide a public announcement that a potential qualifying score has been reached and provide feedback to Participants on both their absolute and relative performance.

Τo qualify for the Grand Prize the RMSE of a Participant’s submitted predictions on the test subset must be less than or equal to 90% of 0.9525, or 0.8572 (the “qualifying RMSE”). 

Contest Prizes:

  1. Grand Prize: $1,000,000 (USD) Cash
  2. Progress Prizes: $50,000 (USD) Cash each award

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s