Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Winning with Big Data: Secrets of the Successful Data Scientist

9.893 visualizaciones

Publicado el

A new class of professionals, called data scientists, have emerged to address the Big Data revolution. In this talk, I discuss nine skills for munging, modeling, and visualizing Big Data. Then I present a case study of using these skills: the analysis of billions of call records to predict customer churn at a North American telecom.

http://en.oreilly.com/datascience/public/schedule/detail/15316

Publicado en: Tecnología
  • Analysis of telecom using data to predict/stop churn
       Responder 
    ¿Estás seguro?    No
    Tu mensaje aparecerá aquí

Winning with Big Data: Secrets of the Successful Data Scientist

  1. 1. WINNING<br />WITH<br />BIG <br />DATA<br />Secrets of the Successful<br />Data Scientist<br />Making Data Work<br />June 9, 2010<br />Michael Driscoll<br />@dataspora<br />
  2. 2. WHY DATA<br />MATTERS<br />
  3. 3. THE INDUSTRIAL<br />AGE <br />OF <br />DATA<br />
  4. 4. WHAT IS <br />BIG DATA?<br />Data that is distributed.<br />
  5. 5. WHAT IS<br />DATA <br />SCIENCE?<br />
  6. 6. NINE WAYS <br />TO WIN<br />
  7. 7. 1. CHOOSE THE<br />RIGHT TOOL<br />You don’t need a chainsaw to cut butter.<br />
  8. 8. 2. COMPRESS EVERYTHING<br />mysqldump -u myuser -p mypasssourceDB| <br />gzip| sshmike@dataspora.com "cat - | <br />gunzip | mysql-u myuser -p mypasstargetDB"<br />The world is IO-bound.<br />
  9. 9. 3. SPLIT UP<br />YOUR DATA<br />Split, apply, combine.<br />See Hadley Wickham’s paper at http://had.co.nz/plyr/plyr-intro-090510.pdf<br />
  10. 10. 4. WORK <br />WITH SAMPLES<br />perl -ne "print if (rand() < 0.01)" <br /> data.csv > sample.csv<br />Big Data is heavy, <br />samples are light.<br />
  11. 11. 5. USE<br />STATISTICS<br />
  12. 12. COPY<br />FROM OTHERS<br />git clone git://github.com/kevinweil/hadoop-lzo<br />Use open source.<br />
  13. 13. 7. ESCAPE<br />CHART TYPOLOGIES<br />Charts are compositions,<br />not containers.<br />
  14. 14. 8. USE COLOR<br />WISELY<br />Color can enhance <br />or insult.<br />
  15. 15. 9. TELL A STORY<br />People are listening.<br />
  16. 16. ONE <br />SUCCESS<br />STORY<br />
  17. 17. WHY DO TELCO CUSTOMERS LEAVE?<br />Sign up<br />Leave<br />Goal: “less churn.”<br />
  18. 18. DATA:<br />BILLIONS<br />OF CALLS<br />… and millions of callers.<br />
  19. 19. DOES CALL <br />QUALITY<br />MATTER?<br />… a difference,<br />but not significant.<br />
  20. 20. WHAT ABOUT<br />SOCIAL<br />NETWORKS?<br />Hmmm...<br />
  21. 21. BUILD THE <br />CALL GRAPH<br />… but is it predictive?<br />
  22. 22. EVOLUTION OF A CALL GRAPH<br />April<br />
  23. 23. EVOLUTION OF A CALL GRAPH<br />May<br />
  24. 24. EVOLUTION OF A CALL GRAPH<br />June<br />
  25. 25. EVOLUTION OF A CALL GRAPH<br />July<br />
  26. 26. 700% INCREASE<br />IN CHURN<br />when a cancellation<br />occurs in a call network.<br />
  27. 27. THANKS!<br />QUESTIONS?<br />Michael Driscoll<br />twitter @dataspora<br />http://www.dataspora.com/blog<br />Making Data Work<br />June 9, 2010<br />

×