Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Data Science Presentation.pdf

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Cargando en…3
×

Eche un vistazo a continuación

1 de 29 Anuncio

Más Contenido Relacionado

Similares a Data Science Presentation.pdf (20)

Más reciente (20)

Anuncio

Data Science Presentation.pdf

  1. 1. DATA SCIENCE ECOSYSTEM M. TAMER ÖZSU NANCY REID RAYMOND NG U. WATERLOO U. TORONTO UBC
  2. 2. Canadian Data Science Workshop DATA SCIENCE/BIG DATA IN THE NEWS… 2
  3. 3. Canadian Data Science Workshop DATA SCIENCE EVERYWHERE!... 3
  4. 4. Canadian Data Science Workshop DATA SCIENCE EVERYWHERE!... 3
  5. 5. Canadian Data Science Workshop DATA SCIENCE EVERYWHERE!... 3
  6. 6. Canadian Data Science Workshop DATA SCIENCE VOCABULARY 4
  7. 7. Canadian Data Science Workshop WHAT IS DATA SCIENCE? 5
  8. 8. Canadian Data Science Workshop WHAT IS DATA SCIENCE? • “Data science, also known as data-driven science, is an interdisciplinary field of scientific methods, processes, algorithms and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining.” 5
  9. 9. Canadian Data Science Workshop WHAT IS DATA SCIENCE? • “Data science, also known as data-driven science, is an interdisciplinary field of scientific methods, processes, algorithms and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining.” • “Data science intends to analyze and understand actual phenomena with ‘data’. In other words, the aim of data science is to reveal the features or the hidden structure of complicated natural, human, and social phenomena with data from a different point of view from the established or traditional theory and method.” 5
  10. 10. Canadian Data Science Workshop WHAT IS DATA SCIENCE? • Fourth paradigm • “… change of all sciences moving from observational, to theoretical, to computational and now to the 4th Paradigm – Data-Intensive Scientific Discovery” 6
  11. 11. Canadian Data Science Workshop WHAT IS IMPORTANT? Need to solve a real problem using data… No applications, no data science. 7
  12. 12. Canadian Data Science Workshop DATA SCIENCE AS A UNIFIER 8 Data Science Humanities Machine/ Statistical Learning Application Domain Expertise Visualization Mathematical Optimization Social Science Law Data Management
  13. 13. Canadian Data Science Workshop DATA SCIENCE AND BIG DATA • They are not the “same thing” • Big data = crude oil • Big data is about extracting “crude oil”, transporting it in “mega tankers”, siphoning it through “pipelines”, and storing it in “massive silos” • Data science is about refining the “crude oil” Carlos Samohano Founder, Data Science London 9
  14. 14. Canadian Data Science Workshop DATA SCIENCE AND ARTIFICIAL INTELLIGENCE Data Science Artificial Intelligence ML/DM/ Analytics 10
  15. 15. Canadian Data Science Workshop DATA SCIENCE AND ARTIFICIAL INTELLIGENCE Data Science Artificial Intelligence ML/DM/ Analytics 10 “Data science produces insights. Machine learning produces predictions”
  16. 16. Canadian Data Science Workshop DATA SCIENCE APPLICATION EXAMPLES • Fraud detection • Investigate fraud patterns in past data • Early detection is important • Before damage propagates • Harder than late detection • Precision is important • False positive and false negative are both bad • Real-time analytics 11
  17. 17. Canadian Data Science Workshop DATA SCIENCE APPLICATION EXAMPLES • Recommender systems • The ability to offer unique personalized service • Increase sales, click-through rates, conversions, … • Netflix recommender system valued at $1B per year • Amazon recommender system drives a 20-35% lift in sales annually • Collaborative filtering at scale 12
  18. 18. Canadian Data Science Workshop DATA SCIENCE APPLICATION EXAMPLES • Predicting why patients are being readmitted • Reduce costs • Improve population health • Find the “why” behind specific populations being readmitted • Data lakes of multiple data sources • Investigate ties between readmission and socioeconomic data points, patient history, genetics, … 13
  19. 19. Canadian Data Science Workshop DATA SCIENCE APPLICATION EXAMPLES • “Smart cities” • Not well-defined 14
  20. 20. Canadian Data Science Workshop DATA SCIENCE APPLICATION EXAMPLES • “Smart cities” • Not well-defined 14
  21. 21. Canadian Data Science Workshop DATA SCIENCE APPLICATION EXAMPLES • “Smart cities” • Not well-defined • Generally refers to using data and ICT to • Better plan communities • Better manage assets • Reduce costs • Deploy open data to better engage with community 14
  22. 22. Canadian Data Science Workshop DATA SCIENCE APPLICATION EXAMPLES • Moneyball • How to build a baseball team on a very low budget by relying on data • Sabermetrics: the statistical analysis of baseball data to objectively evaluate performance • 2002 record of 103-59 was joint best in MLB • Team salary budget: $40 million • Other team: Yankees • Team salary budget: $120 million 15
  23. 23. Canadian Data Science Workshop HOLISTIC APPROACH TO DATA SCIENCE Dissemination & Visualization Ethics, Policy & Social Impact Core Data Acquisition Data Preservation 16 Modeling & Analysis Management of Big Data Making Data Trustable & Usable Data Security & Privacy Application Application Application Application
  24. 24. Canadian Data Science Workshop CORE RESEARCH ISSUES & INTERACTIONS Making Data Trustable & Usable Modelling & Analysis Data Visualization & Dissemination Big Data Management 17
  25. 25. Canadian Data Science Workshop CORE RESEARCH ISSUES & INTERACTIONS Making Data Trustable & Usable Modelling & Analysis Data Visualization & Dissemination Big Data Management • Data cleaning • Sampling • Data provenance 17
  26. 26. Canadian Data Science Workshop CORE RESEARCH ISSUES & INTERACTIONS Making Data Trustable & Usable Modelling & Analysis Data Visualization & Dissemination Big Data Management • Data cleaning • Sampling • Data provenance • Data lakes • Batch & online access • Platforms 17
  27. 27. Canadian Data Science Workshop CORE RESEARCH ISSUES & INTERACTIONS Making Data Trustable & Usable Modelling & Analysis Data Visualization & Dissemination Big Data Management • Data cleaning • Sampling • Data provenance • Data lakes • Batch & online access • Platforms • Models & methods for data lakes • Unsupervised classification & AI 17
  28. 28. Canadian Data Science Workshop CORE RESEARCH ISSUES & INTERACTIONS Making Data Trustable & Usable Modelling & Analysis Data Visualization & Dissemination Big Data Management • Data cleaning • Sampling • Data provenance • Data lakes • Batch & online access • Platforms • Models & methods for data lakes • Unsupervised classification & AI • Visualization for wider audience • Visualization for data exploration • Open data technologies 17
  29. 29. Canadian Data Science Workshop CORE RESEARCH ISSUES & INTERACTIONS Making Data Trustable & Usable Modelling & Analysis Data Visualization & Dissemination Big Data Management • Data cleaning • Sampling • Data provenance • Data lakes • Batch & online access • Platforms • Models & methods for data lakes • Unsupervised classification & AI • Visualization for wider audience • Visualization for data exploration • Open data technologies • DM support for provenance • Data preparation for big data management • Cleaning for data analysis • DM for ML • ML for DM • Visual analytics … 17

×