The path to be
Poo Kuan Hoong, Ph.D
Senior Manager Data Science,
Disclaimer: The views and opinions expressed in this slides are those of
the author and do not necessarily reflect the official policy or position
of Nielsen Malaysia. Examples of analysis performed within this slides
are only examples. They should not be utilized in real-world analytic
products as they are based only on very limited and dated open source
information. Assumptions made within the analysis are not reflective of
the position of Nielsen Malaysia.
• What is a data scientist?
• What kinds of companies that employ data scientists?
• What are the key functions of data scientist?
• What type of work does a data scientist do?
• General Aptitude to be a data scientist
• What skillsets needed to be a data scientist?
• What is data science?
• Where do I begin?
• MDEC National Big App Challenge 3.0 Knowledge Sharing
The term "data scientist" has been
around for years, and the various
advanced analytics specialties that
fall under it are even older.
However, due to recent explosion
of data, the term has been used in
the convergence of disciplines and
that leads to the soaring
What are the job title?
• Data Scientist
• Data Engineer
• Big Data Engineer
• Machine Learning Scientist
• Business Analytics Specialist
• Data Visualization Developer
• BI Solutions Architect/ BI Specialist
• Operations Research Analyst
• Analytics Manager
• Machine Learning Engineer
• Business Intelligence (BI) Engineer
Why the Global Need?
950 Data Analyst (India)
8,411 Data Scientist (US)
808 Data Analyst (UK)
1,188 Data Manager (US)
81 Data Analyst (Australia)
80 in April 2015 1,500 by 2020
The Star, Friday, 24 April 2015
“Malaysia needs 1,500 data scientists by 2020”
Customer churn - who do customers change
• The top 3 reasons why
subscribers change providers:
• They want a new handset
• They believe they pay too
much for calls/data
• Providers do not offer
additional loyalty benefits
• Attribute 1
• Attribute 2
• Attribute 3
Training Model Score Model
Initialization Step Learn Step Apply Step
Machine Learning Framework
Market Basket Analysis
Where should detergents be placed in the
store to maximize sales?
Are bleach products purchased when
detergents and orange juice are bought
Is cola typically purchased with bananas?
Does the brand of cola make a difference?
How are the demographics of the
neighbourhood affecting what customers
• Common sense
• Curious mind
• Clear and simplify
• Love to solve
• Good listening,
• Maths & Stats
I have 4 red, 18 black and 8 brown socks in my sock drawer. If it is
completely dark and I cannot see the colour of the socks that I am
picking, how many socks do I need to take from the drawer to be sure
that I have at least one pair of socks that are the same colour?
• Data science is as an evolutionary step in interdisciplinary fields like
business analysis that incorporate computer science, modeling, statistics,
analytics, and mathematics.
• At its core, data science involves using automated methods to analyze
massive amounts of data and to extract knowledge from them.
• Drawing insight from a piece of data involves understanding how it fits
into the larger picture of an organization,
Massive Open Online Course (MOOC)
• MSC Malaysia MyProCert (SRI) – Data Science Massive Open Online
• The Center of Applied Data Science (MDEC & HRDF)
• John Hopkins University – Data Science Specialization
• University of Washington - Data Science at Scale Specialization
• Data Analyst Nanodegree - Udacity
• CSCI E-109 Data Science (Harvard Extension School)
• Machine Learning - Stanford University
BDA Undergraduate & Postgraduate
• Multimedia University – Bachelor of Computer Science (Data Science
• Sunway University - BSc (Hons) Information Systems (Business
• Universiti Teknologi Malaysia (UTM), International Islamic University
Malaysia, Monash University, University Institute Technology Mara
(UiTM) & University Teknologi Petronas (UTP).
• Big Data Analytics Post Graduate Programme
• Data sets, real problems, in
• Recommend to go through
• Read through the forums
competitions to find out
useful discussion and
tips/hints that will be
useful for solving future
UC Irvine Machine Learning Repository
• 360 data sets as a service to the machine learning community
• Open data from various countries
• Malaysia - http://www.data.gov.my/
• Singapore - https://data.gov.sg/
• June 4th – June 5th 2016, Berjaya Times Square
• The themes for AHKL2016 were as follows:
1. Big Data Analytics --- Powered by MDEC. Access to 65mil
rows of real datasets sponsored by iProperty.com Malaysia
2. O2O Commerce --- Powered by MOLWallet MOLPay
3. Smart Living --- Powered by TIME Internet
Big Data becomes Smart Data
property sites and
analytics and visual
5. Analytics at the
fingertips for both
buyers and sellers
machine learning algorithm
enables search and buy
similar properties that user
sees on the sites, from
user‐generated photos and
from user‐uploaded images
results for users
4. Improved platform
properties for retrieval
purposes or instant
6. Improved user
leads to more
• Have a well-shaped team with not more than one
server-side developer with relevant experience,
one good designer and one the amazing storyteller
• Understand the expected outcomes of the
• Develop something that everyone can see the
• Have an impressive aim or objective
• Start promoting your product during the
• Hit the demo 100%. The pitch is for the product to