Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Big Data Story - From An Engineer's Perspective

Big Data Story
Big Data Technology
Big Data Challenges

  • Inicia sesión para ver los comentarios

Big Data Story - From An Engineer's Perspective

  1. 1. BIG DATA STORY FROM AN ENGINEER'S PERSPECTIVE Hien Luu
  2. 2. Agenda ¨  Big Data Story ¨  Big Data Technology ¨  Big Data Challenges
  3. 3. About Me ¨  Engineering Manager @LinkedIn ¨  Instructor at UCSC Silicon Valley Extension school ¨  Big data enthusiast
  4. 4. Big Data Story THE BIG DATA REVOLUTION Transform how we live, work and think Create opportunities
  5. 5. Big Data Story 2.5 exabytes of data per day 90% of the data in the world has been created in the last two years World's data doubling every two years
  6. 6. Big Data Story World's largest library •  ~37 millions books •  ~10 terabytes TERABYTE (TB) [1,000 Gigabytes]
  7. 7. Big Data Story 1995 - $1,000,000 2015 - $80 1 TB 1 TB
  8. 8. Big Data Story 50 Petabytes – Entire written works of mankind PETABYTE (PB) [1,000 Terabytes] 5 Exabytes – All the words ever spoken by mankind EXABYTE (XB) [1,000 Petabytes]
  9. 9. Big Data Story 250 billion DVDs ZETTABYTE (ZB) [1,000 Exabytes] Size of the entire World Wide Web – 11 trillion years to download a Yottabyte file YOTTABYTE (YB) [1,000 Zettabytes]
  10. 10. Big Data Story US NSA data center – capable of storing a yottabyte of data
  11. 11. Big Data Story 10 exabytes 300 petabytes 1 exabytes Social Networking Ecommerce Search & others
  12. 12. Big Data Story The next wave of revolution (IoT)
  13. 13. Big Data Story Wrist/arm bands, watches, eyewear Cars, navigation devices Heating/ventilation system, air conditioners Body sensors, body implants, pills Traffic/street lights, traffic sensors and signs Wearable Buildings Transportation Health Technology Cities
  14. 14. Big Data Story Tile is the smart companion for all the things you can't stand to lose
  15. 15. Big Data Story Tile community find
  16. 16. Big Data Story Hardware + Software + Connectivity + Data Intelligent Devices Efficient, Safer, Accessible
  17. 17. Big Data Story Tesla : over the air update
  18. 18. Big Data Story 26 billion devices to be operational by 2020 (Source: IDC)
  19. 19. Big Data Story In the age of big data, algorithms will be doing more of the thinking for people - Economist 2010
  20. 20. Big Data Story Google self-driving car
  21. 21. Big Data Story Google self-driving car
  22. 22. Big Data Story Gather nearly 1GB/second
  23. 23. Big Data Story Fueling knowledge economies, sparking innovation, and unleashing waves of creative destruction Data is the oil of the 21st century
  24. 24. Big Data Story Perfect Storm
  25. 25. Big Data Story Data Information Actionable Insight Business Value
  26. 26. Big Data Story Big Data Prediction Recommendation Detection Personalization Inference
  27. 27. Big Data Story Recommendation
  28. 28. Big Data Story $1,000,000 Netflix Prize – 2009 Data set – 100 million movie ratings
  29. 29. Big Data Story Now-casting: Forecasting in near real time
  30. 30. Big Data Story Prediction – Google Flu Trends
  31. 31. Big Data Story Historical query-based estimates vs official influenza surveillance data Early detection of a disease outbreak can save lives http://www.google.org/flutrends/intl/en_us/about/how.html
  32. 32. Big Data Story US spends 15B hours & $120B productivity + fuel
  33. 33. Big Data Story http://inrix.com/xd-traffic/180 million real-time vehicles and devices Real-time traffic information in 37 countries
  34. 34. Big Data Story The Power Of 1 Percent
  35. 35. Big Data Story ~1TB/flight - $200B/year
  36. 36. Big Data Story "Big data is becoming an effective basis of competition in pretty much every industry" - Michael Chui McKinsey Global Institute
  37. 37. Big Data Story Digitally mapping the global economy to connect talent with opportunity at massive scale Connections between people, jobs, skill, companies, and professional knowledge
  38. 38. Big Data Story Using Big Data To Solve Social Problems USING DATA IN THE SERVICE OF HUMANITY
  39. 39. Big Data Story DJ Patil U.S. Chief Data Scientist Weather, health care, climate, flight Responsibly unleash the power of data
  40. 40. Big Data Story “What we realized is data, when used responsibly, is a force multiplier” DJ Patil U.S. Chief Data Scientist White House Office of Science and Technology Policy
  41. 41. Big Data Technology
  42. 42. Big Data Technology Google File System Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung research.google.com/archive/gfs.html MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean, Sanjay Ghemawat research.google.com/archive/mapreduce.html In the beginning…… 2003
  43. 43. Big Data Technology 2003 Google publish GFS 2004 Google publish MapReduce 2006 Hadoop was born 2008 Top level Apache project Y! - 1TB sort in 209 seconds 900-cluster 2009 Google - 1TB sort in 69 seconds 100-cluster 2012 Hadoop version 1.0 2014 Apache Spark 1.0 2015 Spark 1TB Sorting
  44. 44. Big Data Technology Batch King An M/R application that works on a 10GB of data will also run on 10PB of data Automatic parallelism and fault-tolerant Too low level and limiting
  45. 45. Big Data Technology
  46. 46. Big Data Technology http://www.hortonworks.com
  47. 47. Big Data Technology
  48. 48. Big Data Technology http://hortonworks.com/hdp/ Big Data Solution
  49. 49. Big Data Technology Hadoop Ecosystem
  50. 50. Big Data Technology Spark: Cluster Computing with Working Sets people.csail.mit.edu/matei/papers/2010/hotcloud_spark.pdf Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing usenix.org/system/files/conference/nsdi12/nsdi12-final138.pdf In recent days…… 2010 Lightning-fast cluster computing
  51. 51. Big Data Technology Speed In-memory computing Versatile General execution model Ease of use APIs in Scala, Java, Python
  52. 52. Big Data Technology Spark Technology Stack
  53. 53. Big Data Technology Hadoop MR Record Spark Record Data Size 100TB 100TB Elapsed Time 72 mins 23 mins # Nodes 2100 206 # Cores 50400 6592 Sort Rate 1.42 TB/min 4.27 TB/min Sort Rate/ node 0.67 GB/min 20.7 GB/min http://databricks.com/blog/2014/11/05/spark-officiallysets-a-new-record-in-large-scale-sorting.html New record in large-scale sorting
  54. 54. Big Data Technology Lambda Architecture An approach to build big data systems Human fault tolerant Data immutability Re-computation http://lambda-architecture.net/ query = function(all data)
  55. 55. Big Data Technology Lambda Architecture Batch Layer Speed Layer Serving Layer Master dataset Batch View Batch View Speed View Data Query
  56. 56. Big Data Challenges
  57. 57. Big Data Challenges TOP THREE ANALYTICS CHALLENGES Analytical insights into business actions Aggregating multiple data sources Lack of appropriate analytical skills MITSloan – The Talent Divide Research Report
  58. 58. Big Data Challenges Big Data Fast Data Data in motion has equal or greater value than "historical" data
  59. 59. Big Data Challenges Data Driven Culture Data Driven Decisions
  60. 60. Big Data Challenges Data Scientist The sexiest job of the 21 century - Havard Business Review 10/2012 Tend to be better programmers than most statisticians and better statisticians than most programmers - Jeanne Harris
  61. 61. Big Data Challenges SUPPLY DEMAND Current Analytics Talent Shortage of 1.5 million big data experts by 2018
  62. 62. Big Data Challenges Master's Program in Data Science
  63. 63. Big Data Challenges Data Science As A Service
  64. 64. Big Data Challenges Machine learning will be everywhere Prediction Phase "Machine learning is to big data as human learning is to life experience"
  65. 65. Big Data Challenges
  66. 66. Thank You

×