Skip to navigation
   
Dan Jones's Blog

Netflix data

By Dan Jones in Reader

Posted in Netflix on October 12, 2006 at 6:08 pm

Permalink | Author Profile

Recently I’ve been mostly been busy in my spare time, and havn’t had time to post here, as I’ve been working on the Netflixprize in my spare time, and its mesmerised me for the last week at least.    What is it?   Basically $1 million dollars to someone who can better an algorithm they use by 10%.      I’m not going to win it.

For me, thats not the interesting bit, the best bit is the 700Mb compressed, 2Gb uncompressed dataset you get from working on it.   ~17,000 movies with ~ 100,000,000 ratings by my calculations.    Oh, you get movie titles too.   Its all obviously scraped of ANY personal data.

It allows with some simple database work clever queries such as most popular film of x year according to Netflix.

What its best at is though is making people like me who havn’t coded in years realise all those University lectures on code optimisation may actually prove useful - 100 million records in mysql for example - done wrong, and it crawls.   Done right, and queries respond quick - but not quick enough for any super meaningful analysis of data.    Now I’m going to have to go back to drawing board, and look at storing the entire dataset in memory for analysis.

What do you do in your spare time?

12345
Not yet rated
Loading ... Loading ...

 
Advertisement