This talk essentially tells the story of the data science team at Massive Media, the company behind Netlog.com and Twoo.com. After obtaining invaluable first-hand experience in working with big data as a member of the information retrieval team at the music discovery website Last.fm, I joined Massive Media to conceive, build and lead a brand new team around big data and data science for them. In doing so, I developed a pretty clear perspective on how to introduce big data within a company and create added value from it, which is precisely what I would like to share in this talk.
2. MY CAREER PATH SO FAR
2007: Began working with big data as PhD student
2009: Embarked on a data science career at Last.fm
2011: Joined Massive Media as Lead Data Scientist
Data company at heart; one of the earliest Hadoop adopters world-
wide; inventors of Ketama; organised first “NoSQL” meetup in SF.
Huge audience and tremendous potential,
but data science newcomer at the time.
3. Second big product of Massive Media, after Netlog
2011: Initial launch of Twoo.com
2012: Biggest dating site world-wide on comScore
2013: Massive Media acquired by InterActiveCorp
4. IT’S A BIG FAMILY
IAC’s main personals brands:
Some other well-known IAC brands:
6. BOOTSTRAP BY SAVING OR GAINING MONEY
You need to get some capital to get started
Saving money tends to be easier in practice
Real-world example:
• Analyzing CDN logs unveiled abuse
• Stopping the abuse greatly reduced the bills
8. HADOOP
Not the holy grail, but deserves a central role
It has a vibrant community and is proven to be:
ECONOMICAL runs on commodity hardware
SCALABLE smart distributed processing
MAINTAINABLE very robust and fault-tolerant
FLEXIBLE predefined schemas not required
15. THE SECRET IS IN THE MIX
Hadoop’s tricks also apply to data science teams
• Avoid specialisation to allow easy distribution and scaling
• Exploit data locality by hiring people with wide skill set
Great Data Scientists have the right mix of skills
• Hackers with solid technical background
• Analytical mind that knows statistics and machine learning
• Clever and creative in everything they do
16. CHEAPER TECH MAKES PEOPLE MORE EXPENSIVE
Graph by Trifacta. Source: John C. McCallum, Wikipedia and Federal Reserve Bank of St Louis. Inflation adjusted to 2011 dollars.
18. SOME TIPS AND TRICKS
Dare to fail and/or start from estimates
Introduce data exploration/innovation days
• Basically 20% time devoted to playing with data
• Incorporate collaborative brainstorming
• Goal is to find promising new projects to work on
Communicate findings to the rest of the company
• Fun and silliness are allowed
• Prototype early and often
19. PRODUCT INSIGHTS & EXTENSIONS
E.g. recommendations and
activity patterns analysis
20. CUTE OBSERVATIONS FOR PR
http://www.twoo.com/blog/2012/04/twoos-great-global-vocabulary-experiment
21. FIVE SIMPLE STEPS IS ALL IT TAKES
1 FOLLOW THE MONEY
2 EMBRACE HADOOP
3 BUILD DASHBOARDS
4 ASSEMBLE A TEAM
5 EXPLORE & INNOVATE
22. FIVE SIMPLE STEPS IS ALL IT TAKES
1 FOLLOW THE MONEY
2 EMBRACE HADOOP
Thanks!
3 BUILD DASHBOARDS
Questions?
4 ASSEMBLE A TEAM
5 EXPLORE & INNOVATE