Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Intro to Data Science by DatalentTeam at Data Science Clinic#11

Próximo SlideShare
Introduction to data science
Introduction to data science
Cargando en…3

Eche un vistazo a continuación

1 de 83 Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

Similares a Intro to Data Science by DatalentTeam at Data Science Clinic#11 (20)


Más reciente (20)

Intro to Data Science by DatalentTeam at Data Science Clinic#11

  1. 1. Introduction to Data Science @Data Science Clinic #11 6-Jun-2017 All Season Place Dr. Sotarat Thammaboosadee @DatalentTeam
  2. 2. Agenda • What is Data Science? • Motivation • Data Scientist’s Skill • Data Science Process • Relational Database vs NoSQL Database • Data Warehouse vs Data Lake • AI / Machine Learning / Data Mining • Data Visualization • Courses
  3. 3. About Instructor
  4. 4. Datalent Team Member • Wichian Boonyaprapa (Aod) – Business Analyst and Intelligence • Siriraj Hospital • Chanwit Onsumran (Earth) – Financial Analyst • Mahidol University
  5. 5. Datalent Team Member • Teerapat Kansadub (Champ) – Data Engineer • Faculty of Physical Therapy, Mahidol university • Chanon Srisuwan (Fern) – Process Engineer • Ramathibodi Hospital
  6. 6. Datalent Team: Data Talent Development Research Group • FB: Datalent Team
  7. 7. Datalent Team: Data Talent Development Research Group • Website:
  8. 8. What is Data Science? • Data Science – is the study of the generalizable extraction of knowledge from data (Wikipedia) – is getting predictive and/or actionable insight from data (Neil Raden) – Involves extracting, creating, and processing data to run it into business value (Vincent Granville)
  9. 9. What’s new? • Data science is not new, Data science is just modernizing existing reporting solution, analytics solution, data warehousing solution, business intelligence solution, and even data management solution. (Jothi Periasamy) • So, Data science is… – New thinking – New ideas – New data source – New data structure – New data architecture – New data processing mechanism – New innovation on data – New way of solving problems
  10. 10. Motivation: Why data science now?
  11. 11. Why Cloud Computing?
  12. 12. Big Data
  13. 13. Why Big Data?
  14. 14. Internet of Things
  15. 15. Data Scientist’s Skill
  16. 16. Data Scientist’s Skill
  17. 17. Data Science Skills Comparison https://www.datasciencetec
  18. 18. Data Scientist
  19. 19. Data Analyst
  20. 20. Big Data Architect
  21. 21. Data Consultant
  22. 22. Chief Data Officer
  23. 23. Data Scientist Roadmap
  24. 24. Data Science Team evaluate-use-results-build-amazing-data-science-teams/
  25. 25. Data Science Process
  26. 26. Data Science Life Cycle
  27. 27. Data Analytics Levels
  28. 28. Six Types of Databases 45 Relational Analytical (OLAP) Key-Value Column-Family key value key value key value key value DocumentGraph nonrelational-databases-to-business-needs
  29. 29. Relational • Data is usually stored in row by row manner (row store) • Standardized query language (SQL) • Data model defined before you add data • Joins merge data from multiple tables • Results are tables • Pros: mature ACID transactions with fine-grain security controls • Cons: Requires up front data modeling, does not scale well 46 nonrelational-databases-to-business-needs
  30. 30. Analytical (OLAP) • Based on "Star" schema with central fact table for each event • Optimized for analysis of read- analysis of historical data • Use of MDX language to count query "measures" for "categories" of data • Pros: fast queries for large data • Cons: not optimized for transactions and updates 47 nonrelational-databases-to-business-needs
  31. 31. Key-Value Stores • Keys used to access opaque blobs of data • Values can contain any type of data (images, video) Pros: scalable, simple API (put, get, delete) Cons: no way to query based on the content of the value 48 key value key value key value key value nonrelational-databases-to-business-needs
  32. 32. Column-Family • Key includes a row, column family and column name • Store versioned blobs in one large table • Queries can be done on rows, column families and column names • Pros: Good scale out • Cons: Can not query blob content, row and column designs are critical 49 Examples: HBase, Cassandra nonrelational-databases-to-business-needs
  33. 33. Graph Store • Data is stored in a series of nodes and properties • Queries are really graph traversals • Ideal when relationships between data is key: – e.g. social networks • Pros: fast network search, works with public linked data sets • Cons: Poor scalability when graphs don't fit into RAM, specialized query language 50 Examples: Neo4j, AllegroGraph nonrelational-databases-to-business-needs
  34. 34. Document Store • Data stored in nested hierarchies • Logical data remains stored together as a unit • Any item in the document can be queried • Pros: No object-relational mapping layer, ideal for search • Cons: Complex to implement, incompatible with SQL 51 Examples: MongoDB, Couchbase nonrelational-databases-to-business-needs
  35. 35. NewSQL? and-beyond-the-answer-to-sprained-relational-databases-too-much-information
  36. 36. Data Warehouse vs Data Lake
  37. 37. now
  38. 38. differences.html
  39. 39. What is data mining?
  40. 40. AI / Machine Learning / Data Mining backwards-looking-forwards-sas-data-mining-and-machine-learning/
  41. 41. Data Analytics Levels
  42. 42. Data Mining Tasks
  43. 43. Data Mining Tasks
  44. 44. CRISP-DM CRoss-Industry Standard Process for Data Mining
  45. 45. CRISP-DM: Overview
  46. 46. CRISP-DM: Phases • Business Understanding Project objectives and requirements understanding, Data mining problem definition • Data Understanding Initial data collection and familiarization, Data quality problems identification • Data Preparation Table, record and attribute selection, Data transformation and cleaning • Modeling Modeling techniques selection and application, Parameters calibration • Evaluation Business objectives & issues achievement evaluation • Deployment Result model deployment, Repeatable data mining process implementation
  47. 47. Phases and Tasks Business Understanding Data Understanding Data Preparation Modeling DeploymentEvaluation Format Data Integrate Data Construct Data Clean Data Select Data Determine Business Objectives Review Project Produce Final Report Plan Monitering & Maintenance Plan Deployment Determine Next Steps Review Process Evaluate Results Assess Model Build Model Generate Test Design Select Modeling Technique Assess Situation Explore Data Describe Data Collect Initial Data Determine Data Mining Goals Verify Data Quality Produce Project Plan
  48. 48. Data Visualization
  49. 49. Courses • [Microsoft] us/professional-program/ • [IBM] • [Datacamp] • DataScience Thailand • Evenmore…
  50. 50. ดร. โษฑศ์รัตต ธรรมบุษดี (โอม) Line: @zotarutto FB: Datalent Team