This document summarizes Mark Levy's presentation on using Hadoop for algorithms on user data at Last.fm. It describes how Last.fm uses Hadoop to compute music charts from over 1 billion user scrobbles per month and calculate royalties. It also discusses using Hadoop for topic modeling of user profiles and documents through LDA, graph recommendations through label propagation on the user-item graph, and processing audio files to extract metadata through a map-reduce approach.
41. Topic Modelling: AD-LDA class GibbsSamplingMapper: init(): load current word-topic matrix map(docID,doc): for w,z in doc: compute p(z|w) from matrix,doc sample new_z from p(z|w) doc[w] = new_z yield docID,doc for w,z in doc: yield (w,z),1
42. Topic Modelling: AD-LDA class Reducer: reduce(key,val): if val is a docID: # save new topic assignments yield key,val else: # update word-topic matrix matrix[key] += val
85. Label Propagation class Reducer: reduce(nodeID,msgs): # accumulate labels = defaultdict(lambda:0) for msg in msgs: for label,w in msg: labels[label] += w # normalise, prune normalise(labels,MAX_LABELS_PER_NODE) yield nodeID,labels