Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Big Data, Big Deal: For Future Big Data Scientists

Big Data, Big Deal: For Future Big Data Scientists

  • Inicia sesión para ver los comentarios

Big Data, Big Deal: For Future Big Data Scientists

  1. 1. Big Data, Big Deal For Future Big Data Scientists Prepared By: Wei-Yen Lin May, 2013
  2. 2. Outline  A Buzzword: Big Data  What Is Big Data: Big Data 101  What Make It Happen: Drivers For Big Data  How To Deal With: Existing Big Data Technologies  How To Improve: Challenges For Big Data  How To Be A Good Big Data Scientist Big Data, Big Deal 2013  Page 2 Trying To Answer ....
  3. 3. Big Data, Big Deal 2013  Page 3
  4. 4. Everyone Is Talking About Big Data... Big Data, Big Deal 2013  Page 4
  5. 5. A Truth: We„ve Already Lived In A Big Data World Big Data, Big Deal 2013  Page 5
  6. 6. Units Of Big Data Is Different... Big Data, Big Deal 2013  Page 6 Source: Computer Sciences Corporation, 2012
  7. 7. Big Data, Big Deal 2013  Page 7 So, Big Data = Data Is Big?
  8. 8. Big Data, Big Deal 2013  Page 8
  9. 9. Origin Of The Term  First ACM article to use the term (Michael Cox and David Ellsworth, Ames Research Center, 1997) “…data sets are generally quite large, taxing the capacities of main memory, local disk, and even remote disk. We call this the problem of big data.”  First definition (Francis Diebold, University of Pennsylvania, 2000) “Big Data refers to the explosion in the quantity (and sometimes, quality) of available and potentially relevant data, largely the result of recent and unprecedented advancements in data recording and storage technology.” Big Data, Big Deal 2013  Page 9
  10. 10. Big Data, Big Deal 2013  Page 10 Commonly accepted 3 V‟s of Big Data Doug Laney with the Meta Group, 2001
  11. 11. Volume, Velocity, Variety: Examples  Volume – Terabyte records, transactions, tables, files – a Jumbo jet create 640TB on one Atlantic crossing X 25,000 flights flown each day  Velocity – batch, near-time, real-time, streams. – Today’s on-line ad serving requires 40ms to respond with a decision. – Financial services need near 1MS to calculate customer scoring probabilities  Variety – structures, unstructured, semi-structured, and all the above in a mix. –WalMart processes 1M customer transactions per hour and feeds information to a database estimated at 2.5PB (petabytes). –There are old and new data sources like RFID, sensors, mobile payments, in-vehicle tracking, etc. Big Data, Big Deal 2013  Page 11
  12. 12. Three Top-Level Elements  Data storage infrastructure, and resources to manipulate it Big Data, Big Deal 2013  Page 12 Data Management Data Analysis  Technologies and tools to analyze the data and glean insight from it Data Use  Putting Big Data insights to work in Business Intelligence and end-user applications Source: Martin Hall, 2011
  13. 13. To Sum Up, Big Data Is … Big Data, Big Deal 2013  Page 13 Big Data is high-volume, high-velocity, and/or high- variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization. Characteristic Goal Solution
  14. 14. Big Data, Big Deal 2013  Page 14
  15. 15. Key Drivers of Big Data Technology Demand  Scientific experiments and tools are becoming heavily based on data processing Big Data, Big Deal 2013  Page 15 Modern Science in search for new knowledge Google and Facebook: have driven many advances in Big Data efficiency Technical Drivers (1)  Google handles number of search queries at 3 billion per day  Twitter handles some 400 million tweets per day count for 12 terabytes per day
  16. 16.  The McKinsey Quarterly:The demand for storage has grown more than 50% annually in recent years Big Data, Big Deal 2013  Page 16 Data collected and stored continues to grow exponentially Data is increasingly everywhere and in many formats Key Drivers of Big Data Technology Demand Technical Drivers (2)
  17. 17.  Genomic research, drugs development, Healthcare  High-tech industry, CAD/CAM, weather/climate, etc. Big Data, Big Deal 2013  Page 17 Traditional data intensive industry Business (retail) uses Big Data technologies “to search” for customers  Delivering directly to customers requires prediction of customer behavior Key Drivers of Big Data Technology Demand Business Drivers (1)
  18. 18.  Captures preferences by the user and makes recommendations based on previous record Big Data, Big Deal 2013  Page 18 Consumer products and services delivery The rise of public opinion stored in platforms Key Drivers of Big Data Technology Demand Business Drivers (2) Social media  Managing public campaigns , e.g. election, integrated public relations
  19. 19. Big Data, Big Deal 2013  Page 19
  20. 20. Big Data, Big Deal 2013  Page 20 Big Data Techniques Few Examples  Supervised Learning – Support Vector Machine  Unsupervised learning – Cluster Analysis  Data fusion – Signal processing, Natural Language Processing  Optimization – Genetic Algorithm, Neural Networks  Predictive Modeling – Regression, Time Series Analysis
  21. 21. Big Data, Big Deal 2013  Page 21 Big Data Technologies Where processing is hosted? — Distributed Servers/Cloud (e.g. Amazon EC2) Where data is stored? — Distributed Storage (e.g. HadoopDFS) What is programming model? — Distributed Processing (e.g. MapReduce) How data is stored& indexed? — High-performance schema-free database (e.g. Cassandra) What operations are performed? — Data Analytics, Semantic Processing (e.g. R)
  22. 22. Big Data, Big Deal 2013  Page 22 Big Data Landscape Source: Forbes, 2012
  23. 23. Big Data, Big Deal 2013  Page 23
  24. 24. From Data Mining To Big Data Mining Big Data, Big Deal 2013  Page 24 Source: Robert J. Abate, 2012
  25. 25. The Life Cycle Of Big Data Method Should Be ... Big Data, Big Deal 2013  Page 25 Source: Robert J. Abate, 2012
  26. 26. Challenge For Big Data  How to find high-quality data from the vast collections of data? How good is the data? How broad is the coverage? Big Data, Big Deal 2013  Page 26 Data quality Data comprehensiveness Data  Are there areas without coverage? What are the implications? Data Reliability and Validity  How to determine the quality of data sets and relevance to particular issues
  27. 27. Challenge For Big Data  To handle/discover new data structures and multi-type data relations  To respond to specific use cases and operations over data Big Data, Big Deal 2013  Page 27 Data mining/data intelligence algorithms Processing Data interpretation  Understand the output and model it through some form of simulation. Domain experts must continue to play a role. Must be wary of becoming too beholden to the numbers.
  28. 28. Challenge For Big Data  Is Cloud Computing a right technology? Any alternative?  Highspeed network infrastructure, on-demand provisioning  To respond to specific use cases and operations over data Big Data, Big Deal 2013  Page 28 Infrastructure support for storing, moving data, on-demand processing Management Security, trustworthiness and data centric security  Much of this information is about people. How to extract enough information to help people without extracting so much as to compromise their privacy?
  29. 29. Big Data, Big Deal 2013  Page 29
  30. 30. Big Data Talent Big Data, Big Deal 2013  Page 30 Three Groups: Deep Analytical, Big Data Savvy, Supporting Tech. Source: U.S. Bureau Of Labor Statistics, McKinsey
  31. 31. Technical expertise have deep expertise in some scientific discipline. Curiosity a desire to go beneath the surface Storytelling the ability to use data to tell a story and to be able to communicate it effectively. Cleverness the ability to look at a problem in different, creative ways. Qualities Of Data Scientists Big Data, Big Deal 2013  Page 31 Advice From DJ Patil, The World's 7 Most Powerful Data Scientists(Forbes)
  32. 32. Big Data, Big Deal 2013  Page 32

×