SlideShare una empresa de Scribd logo
1 de 22
Descargar para leer sin conexión
CREATING
ADDED VALUE
WITH BIG DATA




     by KLAAS BOSTEELS
             @klbostee
MY CAREER PATH SO FAR


2007: Began working with big data as PhD student
2009: Embarked on a data science career at Last.fm
2011: Joined Massive Media as Lead Data Scientist

         Data company at heart; one of the earliest Hadoop adopters world-
         wide; inventors of Ketama; organised first “NoSQL” meetup in SF.

                                  Huge audience and tremendous potential,
                                  but data science newcomer at the time.
Second big product of Massive Media, after Netlog

2011: Initial launch of Twoo.com
2012: Biggest dating site world-wide on comScore
2013: Massive Media acquired by InterActiveCorp
IT’S A BIG FAMILY


IAC’s main personals brands:




Some other well-known IAC brands:
STEP 1
FOLLOW THE MONEY

                   photo by Chris Isherwood
BOOTSTRAP BY SAVING OR GAINING MONEY


You need to get some capital to get started

Saving money tends to be easier in practice

Real-world example:

   • Analyzing CDN logs unveiled abuse
   • Stopping the abuse greatly reduced the bills
STEP 2
EMBRACE HADOOP

                 photo by Doug Kukurudza
HADOOP


Not the holy grail, but deserves a central role

It has a vibrant community and is proven to be:

    ECONOMICAL runs on commodity hardware
    SCALABLE             smart distributed processing
    MAINTAINABLE very robust and fault-tolerant
    FLEXIBLE             predefined schemas not required
STEP 3
BUILD DASHBOARDS

                   photo by Dawn Hopkins
STATS PIPELINE BASED ON HADOOP


              Log collector

                 HDFS

               MapReduce

Dashboards       HBase
                                       in batches
                                       continuous
STATS PIPELINE BASED ON HADOOP

Cfr. “lambda
architecture”
                Log collector
 coined by
@nathanmarz        HDFS
                                    Realtime
                                   processing
                 MapReduce

 Dashboards        HBase
                                         in batches
                                         continuous
STATS PIPELINE BASED ON HADOOP

Cfr. “lambda
architecture”
                Log collector
 coined by
@nathanmarz        HDFS
                                    Realtime
   Ad-hoc                          processing
   results       MapReduce

 Dashboards        HBase
                                         in batches
                                         continuous
CUSTOM-TAILORED WEB INTERFACE

                           Annotation
                          & exporting
                          functionality

                           Supports
                          A/B testing
                          and cohort
                            analysis

                         Various other
                          nifty extra’s
STEP 4
ASSEMBLE A TEAM

                  photo by Jean-François Schmitz
THE SECRET IS IN THE MIX


Hadoop’s tricks also apply to data science teams
   • Avoid specialisation to allow easy distribution and scaling
   • Exploit data locality by hiring people with wide skill set
Great Data Scientists have the right mix of skills
   • Hackers with solid technical background
   • Analytical mind that knows statistics and machine learning
   • Clever and creative in everything they do
CHEAPER TECH MAKES PEOPLE MORE EXPENSIVE




Graph by Trifacta. Source: John C. McCallum, Wikipedia and Federal Reserve Bank of St Louis. Inflation adjusted to 2011 dollars.
STEP 5
EXPLORE & INNOVATE

                     photo by NASAr
SOME TIPS AND TRICKS


Dare to fail and/or start from estimates
Introduce data exploration/innovation days
  • Basically 20% time devoted to playing with data
  • Incorporate collaborative brainstorming
  • Goal is to find promising new projects to work on
Communicate findings to the rest of the company
  • Fun and silliness are allowed
  • Prototype early and often
PRODUCT INSIGHTS & EXTENSIONS


                  E.g. recommendations and
                  activity patterns analysis
CUTE OBSERVATIONS FOR PR




http://www.twoo.com/blog/2012/04/twoos-great-global-vocabulary-experiment
FIVE SIMPLE STEPS IS ALL IT TAKES



1   FOLLOW THE MONEY

2   EMBRACE HADOOP

3   BUILD DASHBOARDS

4   ASSEMBLE A TEAM

5   EXPLORE & INNOVATE
FIVE SIMPLE STEPS IS ALL IT TAKES



1   FOLLOW THE MONEY

2   EMBRACE HADOOP
                                 Thanks!
3   BUILD DASHBOARDS
                                Questions?
4   ASSEMBLE A TEAM

5   EXPLORE & INNOVATE

Más contenido relacionado

Destacado

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Destacado (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Creating Added Value with Big Data

  • 1. CREATING ADDED VALUE WITH BIG DATA by KLAAS BOSTEELS @klbostee
  • 2. MY CAREER PATH SO FAR 2007: Began working with big data as PhD student 2009: Embarked on a data science career at Last.fm 2011: Joined Massive Media as Lead Data Scientist Data company at heart; one of the earliest Hadoop adopters world- wide; inventors of Ketama; organised first “NoSQL” meetup in SF. Huge audience and tremendous potential, but data science newcomer at the time.
  • 3. Second big product of Massive Media, after Netlog 2011: Initial launch of Twoo.com 2012: Biggest dating site world-wide on comScore 2013: Massive Media acquired by InterActiveCorp
  • 4. IT’S A BIG FAMILY IAC’s main personals brands: Some other well-known IAC brands:
  • 5. STEP 1 FOLLOW THE MONEY photo by Chris Isherwood
  • 6. BOOTSTRAP BY SAVING OR GAINING MONEY You need to get some capital to get started Saving money tends to be easier in practice Real-world example: • Analyzing CDN logs unveiled abuse • Stopping the abuse greatly reduced the bills
  • 7. STEP 2 EMBRACE HADOOP photo by Doug Kukurudza
  • 8. HADOOP Not the holy grail, but deserves a central role It has a vibrant community and is proven to be: ECONOMICAL runs on commodity hardware SCALABLE smart distributed processing MAINTAINABLE very robust and fault-tolerant FLEXIBLE predefined schemas not required
  • 9. STEP 3 BUILD DASHBOARDS photo by Dawn Hopkins
  • 10. STATS PIPELINE BASED ON HADOOP Log collector HDFS MapReduce Dashboards HBase in batches continuous
  • 11. STATS PIPELINE BASED ON HADOOP Cfr. “lambda architecture” Log collector coined by @nathanmarz HDFS Realtime processing MapReduce Dashboards HBase in batches continuous
  • 12. STATS PIPELINE BASED ON HADOOP Cfr. “lambda architecture” Log collector coined by @nathanmarz HDFS Realtime Ad-hoc processing results MapReduce Dashboards HBase in batches continuous
  • 13. CUSTOM-TAILORED WEB INTERFACE Annotation & exporting functionality Supports A/B testing and cohort analysis Various other nifty extra’s
  • 14. STEP 4 ASSEMBLE A TEAM photo by Jean-François Schmitz
  • 15. THE SECRET IS IN THE MIX Hadoop’s tricks also apply to data science teams • Avoid specialisation to allow easy distribution and scaling • Exploit data locality by hiring people with wide skill set Great Data Scientists have the right mix of skills • Hackers with solid technical background • Analytical mind that knows statistics and machine learning • Clever and creative in everything they do
  • 16. CHEAPER TECH MAKES PEOPLE MORE EXPENSIVE Graph by Trifacta. Source: John C. McCallum, Wikipedia and Federal Reserve Bank of St Louis. Inflation adjusted to 2011 dollars.
  • 17. STEP 5 EXPLORE & INNOVATE photo by NASAr
  • 18. SOME TIPS AND TRICKS Dare to fail and/or start from estimates Introduce data exploration/innovation days • Basically 20% time devoted to playing with data • Incorporate collaborative brainstorming • Goal is to find promising new projects to work on Communicate findings to the rest of the company • Fun and silliness are allowed • Prototype early and often
  • 19. PRODUCT INSIGHTS & EXTENSIONS E.g. recommendations and activity patterns analysis
  • 20. CUTE OBSERVATIONS FOR PR http://www.twoo.com/blog/2012/04/twoos-great-global-vocabulary-experiment
  • 21. FIVE SIMPLE STEPS IS ALL IT TAKES 1 FOLLOW THE MONEY 2 EMBRACE HADOOP 3 BUILD DASHBOARDS 4 ASSEMBLE A TEAM 5 EXPLORE & INNOVATE
  • 22. FIVE SIMPLE STEPS IS ALL IT TAKES 1 FOLLOW THE MONEY 2 EMBRACE HADOOP Thanks! 3 BUILD DASHBOARDS Questions? 4 ASSEMBLE A TEAM 5 EXPLORE & INNOVATE