Publicidad

The path to be a data scientist

Data Scientist, Consultant, Trainer
25 de Jan de 2017
Publicidad

Más contenido relacionado

Publicidad

Más de Poo Kuan Hoong(20)

Publicidad

The path to be a data scientist

  1. The path to be a Data Scientist Poo Kuan Hoong, Ph.D Senior Manager Data Science, Nielsen Malaysia
  2. Disclaimer: The views and opinions expressed in this slides are those of the author and do not necessarily reflect the official policy or position of Nielsen Malaysia. Examples of analysis performed within this slides are only examples. They should not be utilized in real-world analytic products as they are based only on very limited and dated open source information. Assumptions made within the analysis are not reflective of the position of Nielsen Malaysia.
  3. Agenda • What is a data scientist? • What kinds of companies that employ data scientists? • What are the key functions of data scientist? • What type of work does a data scientist do? • General Aptitude to be a data scientist • What skillsets needed to be a data scientist? • What is data science? • Where do I begin? • MDEC National Big App Challenge 3.0 Knowledge Sharing
  4. Self Introduction Poo Kuan Hoong, http://www.linkedin.com/in/kuanhoong • Senior Manager Data Science • Senior Lecturer • Chairperson Data Science Institute • Coursera Facilitator • Consultant • Funding mentor • Founder • Speaker/Trainer
  5. https://www.meetup.com/MY-RUserGroup/
  6. https://www.facebook.com/rusergroupmalaysia/
  7. What is a Data Scientist?
  8. Data Scientist The term "data scientist" has been around for years, and the various advanced analytics specialties that fall under it are even older. However, due to recent explosion of data, the term has been used in the convergence of disciplines and that leads to the soaring popularity.
  9. What are the job title? • Data Scientist • Data Engineer • Big Data Engineer • Machine Learning Scientist • Business Analytics Specialist • Data Visualization Developer • BI Solutions Architect/ BI Specialist • Operations Research Analyst • Analytics Manager • Machine Learning Engineer • Statistician • Business Intelligence (BI) Engineer
  10. Why the Global Need? Abundance of Data Availability of affordable compute resources Internet of Things (IoT) sensors data
  11. 950 Data Analyst (India) 8,411 Data Scientist (US) 808 Data Analyst (UK) 1,188 Data Manager (US) 81 Data Analyst (Australia)
  12. 80 in April 2015 1,500 by 2020 The Star, Friday, 24 April 2015 “Malaysia needs 1,500 data scientists by 2020”
  13. What kinds of companies that employ data scientists?
  14. MNC Government BANKS
  15. What are the key functions of data scientist?
  16. Key functions of data scientist Devising Business Strategies from the insights Descriptive and Predictive Analytics Data Mining and Analysis Design Understanding the business problem
  17. Scenario 1: Customer Churn Analytics
  18. Churn analytics • Predicting who will switch mobile operator
  19. Customer churn - who do customers change operators? • The top 3 reasons why subscribers change providers: • They want a new handset • They believe they pay too much for calls/data • Providers do not offer additional loyalty benefits
  20. Data Collection Data Preprocessing Attributes selection • Attribute 1 • Attribute 2 • Attribute 3 Algorithm Training Model Score Model Apply Data /Test Data Predicting Output Initialization Step Learn Step Apply Step Machine Learning Framework
  21. Correlation Matrix
  22. Feature selection
  23. Models comparison • Receiver operating characteristic curve (ROC curve) illustrates the performance of a binary classifier system as its discrimination threshold is varied.
  24. Scenario 2: Market Basket Analysis
  25. Market Basket Analysis Where should detergents be placed in the store to maximize sales? Are bleach products purchased when detergents and orange juice are bought together? Is cola typically purchased with bananas? Does the brand of cola make a difference? How are the demographics of the neighbourhood affecting what customers are buying?
  26. What type of work does a data scientist do?
  27. http://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time- consuming-least-enjoyable-data-science-task-survey-says/#f37c7f758459
  28. http://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time- consuming-least-enjoyable-data-science-task-survey-says/#f37c7f758459
  29. General aptitude to be Data Scientist
  30. Data Scientist • Common sense • Curious mind • Clear and simplify thought • Love to solve puzzles • Good listening, writing and communication skills • Maths & Stats • Business sense
  31. I have 4 red, 18 black and 8 brown socks in my sock drawer. If it is completely dark and I cannot see the colour of the socks that I am picking, how many socks do I need to take from the drawer to be sure that I have at least one pair of socks that are the same colour?
  32. What is the hidden number under the car?
  33. What skillsets needed to be a data scientist?
  34. Data scientist skillsets • Data Mining • Machine Learning • R/Python • Data Analysis • Statistics • SQL • Java • Algorithms Image Source: http://imgur.com/hoyFT4t
  35. What is the average salary?
  36. Average salary: Data Scientist
  37. What is data science?
  38. Data Science • Data science is as an evolutionary step in interdisciplinary fields like business analysis that incorporate computer science, modeling, statistics, analytics, and mathematics. • At its core, data science involves using automated methods to analyze massive amounts of data and to extract knowledge from them. • Drawing insight from a piece of data involves understanding how it fits into the larger picture of an organization,
  39. Where do I begin?
  40. Massive Open Online Course (MOOC) • MSC Malaysia MyProCert (SRI) – Data Science Massive Open Online Courses (MOOC) • The Center of Applied Data Science (MDEC & HRDF) • John Hopkins University – Data Science Specialization • University of Washington - Data Science at Scale Specialization • Data Analyst Nanodegree - Udacity • CSCI E-109 Data Science (Harvard Extension School) • Machine Learning - Stanford University
  41. BDA Undergraduate & Postgraduate Programme Undergraduate • Multimedia University – Bachelor of Computer Science (Data Science Specialization) • Sunway University - BSc (Hons) Information Systems (Business Analytics) • Universiti Teknologi Malaysia (UTM), International Islamic University Malaysia, Monash University, University Institute Technology Mara (UiTM) & University Teknologi Petronas (UTP). Postgraduate • Big Data Analytics Post Graduate Programme
  42. Kaggle • Data sets, real problems, in unprocessed manner. • Recommend to go through past competitions. • Read through the forums with particular competitions to find out useful discussion and tips/hints that will be useful for solving future problems. • https://www.kaggle.com/
  43. UC Irvine Machine Learning Repository • 360 data sets as a service to the machine learning community http://archive.ics.uci.edu/ml/
  44. Open data • Open data from various countries • Malaysia - http://www.data.gov.my/ • Singapore - https://data.gov.sg/
  45. MDEC National Big App Challenge 3.0
  46. • June 4th – June 5th 2016, Berjaya Times Square • The themes for AHKL2016 were as follows: 1. Big Data Analytics --- Powered by MDEC. Access to 65mil rows of real datasets sponsored by iProperty.com Malaysia 2. O2O Commerce --- Powered by MOLWallet MOLPay 3. Smart Living --- Powered by TIME Internet
  47. National MDEC Big App Challenge 3.0
  48. PropertySenze • B2B business model • Provide machine learning and AI services to customers • Visual Search • Personalized customer experience
  49. BUSINESS MODEL Big Data becomes Smart Data 1. PropertySenze contracts with property sites and property developers to generate analytics and visual search 5. Analytics at the fingertips for both buyers and sellers 2. PropertySenze’s machine learning algorithm enables search and buy similar properties that user sees on the sites, from user‐generated photos and from user‐uploaded images 3. Enhanced search experience and personalized results for users 7. PropertySenze verifies all transactions and charges commission fees every month 4. Improved platform that recognizes properties for retrieval purposes or instant purchases. 6. Improved user experience that leads to more engagement and sale transactions
  50. PropertySenze
  51. Hackathon: Tips • Have a well-shaped team with not more than one server-side developer with relevant experience, one good designer and one the amazing storyteller • Understand the expected outcomes of the hackathon • Develop something that everyone can see the benefits • Have an impressive aim or objective • Start promoting your product during the hackathon • Hit the demo 100%. The pitch is for the product to shine
  52. Thanks! Questions? @kuanhoong https://www.linkedin.com/in/kuanhoong kuanhoong@gmail.com
Publicidad