SlideShare una empresa de Scribd logo
1 de 27
Descargar para leer sin conexión
How to crack down
BIG DATA?
Required Skills for Data Scientist
hello!
I AM
DAVID
HUANG
I am here because I
want to find more lovers for
data science .
You can find me at:
tawei.huang1@gmail,com
My Experience
• Data Scientist Intern, Yoctol
• Data & Strategy Intern , Chocolabs
• Summer Intern Student, Institute of
Mathematics, Academic Sinica
My Education Background
• Master in Statistics, NTU
• BSc. In Quantitative Finance, NTHU
• Research Student, PKU
“Big data is a big trend, but it is very
difficult to hire a data scietist.
It’s also hard to find a job in TW XD
“
1. Who is a Data
Scientist?
The skill sets you need to be a
data scientist.
In a big data project, we need these people!
Data
Backend
Engineer
Database
Architect
Data Analyst /
Data Scientist
Domain
Expert
Develop and operate
backend systems related
to data access, collection,
processing and storage,
Architect and design Database
solutions for the enterprise, and
lead the effort on database
performance and optimization
To use advanced quantitative analysis,
data mining techniques and strong
industry acumen to interpret, connect
and predict data to deliver insight and
recommendations for decisions.
Assist the data team to
understand the domain
problem & knowledge.
Data analyst / Machine Leaning
Lots of people say that they are different, but I think “every data
analyst should be a data scientist, and the converse holds!”
Explanatory
Analytics
 Theory-based, statistical testing of causal
hypothesis (commonly see in economics)
 Strength of relationship in statistical model
Data analyst
Predictive
Analytics
 Empirical method for predicting new
observations (in statistical / math / CS ways)
 Ability to accurately predict new individuals
Data scientist
Both fields are important for discovering knowledge.
Data Unicorn
A data unicorn expertises in all
fields… Mission impossible?
The Data Scientist Venn Diagram
Math &
Statistics
Hacking
Skills
Domain
Expertise
Machine
Learning
ResearchProgram
Unicorn
First become a
(1) researcher,
(2) machine learner,
(3) programmer,
and then find your own
way to be a data scientist.
Skill Sets for Data Scientist – Math & Stat
Mathematics & Statistics
Multivariable Calculus
Linear Algebra
Probability Theory
Statistics / Math Statistics
Convex Optimization
Discrete Analysis
Basic Knowledge
Regression Analysis / GLM
Experimental Design
Causal Inference
Multivariate Analysis
Biz Analytics & Data Mining
Data Mining
Machine Learning
Deep Learning (ANN/CNN)
Machine Learning
Time Series Analysis
Forecasting
1
Skill Sets for Data Scientist – Programming
Programming Skills
Python
(Scripting Language)
R
(Statistical Software)
Matlab
(Super Fast but Expensive)
Programming Skill
SQL & Relational Algebra
NoSQL / Cassandra / etc.
HDFS / Map Reduce
Hadoop and Hive /Pig
Spark & Scala
Database Querying
A little bit Java
Data Structure & Algorithm
Data Munging (python!)
Data Viz (d3.js / Tableau)
Software Engineering
2
1. D3.js visualization: http://goo.gl/cVlTX7
2. Spark MiLib: http://goo.gl/VNMQ97
Skill Sets for Data Scientist – Business Sense
Business Professionalism
Hypothesis Thinking
Pyramid Principles
BizPro is a good choice!
Logical Thinking
To be honest, the crucial truth is that “this part is very important, but
the less important skill set!”
Presentation & Presence
Communication Skill
Upward Management
Communication Skill
I think this is the niche for business school students. Specific
knowledge about marketing, financial analysis, etc. helps a lot.
3
My Learning Path for you – Math matters!
Calculus
Linear Algebra
Probability Theory
Math Statistics
Freshman - Junior
1
Programming
C / Java / R
2
Financial Market
Marketing
Management
3
Advance Statistics
Data Mining
Econometrics
Senior
R Programming
Matlab (Basic)
Competitions
Advanced Finance
Macroeconomics
Statistical Learning
Compress Sensing
Current
Python & SQL
Hadoop & Spark
BizPro Training
Logical Thinking
Marketing Analytic
2. Master in Data
Science free!
How to become the data unicorn
without any tuition fee
Data Scientist 101: Johns Hopkins MOOC
The Coursera Specializations offered by Johns Hopkins University give a
very good general exposure to the world of data science.
Executive Data Science
I think this specialization is designed for those who don’t want to become a
data scientist but may work in a data-driven company.
URL: https://goo.gl/ZNBF7N
Data Science
I think this specialization is designed for those who don’t have a very strong
academic background but want to become a data scientist.
URL: https://goo.gl/8OzBhe
Difficulty 
Difficulty   
Basic Math: Calculus & Linear Algebra
Calculus and linear algebra are fundamental tools for data scientists and
statisticians. Having a solid foundation will help a lot.
Calculus I & II, NTHU
This course gives you a solid foundation of Euclidean space and
multivariable calculus, which is very important for a data scientist.
URL: http://ocw.nthu.edu.tw/ocw/index.php?page=course&cid=7&
Linear Algebra, NCTU
A data scientist usually thinks data with a matrix representation. The concept
of vector algebra helps a lot for high dimensional data analysis.
URL: http://goo.gl/KFdJTT
Difficulty  
Difficulty  
Advance Math: Convex Optimization
This is a very advanced topic we will use when doing machine learning.
However, I don’t think every data scientist should understand this field.
Convex Optimization, Stanford
This course should benefit anyone who uses or will use scientific computing
or optimization in engineering or related work (e.g., machine learning,
finance, operational research).
URL: http://stanford.edu/class/ee364a/
MOOC: https://goo.gl/KBQ473
Difficulty     
Basic Stat: Probability & Math Statistics
If you don‘t have a probability & math statistics, you can’t learn any advanced
data analytics method. Please learn it!
Probability, NTHU
This course gives you a solid foundation of Euclidean space and
multivariable calculus, which is very important for a data scientist.
URL: http://goo.gl/G4MhIj
Math Statistics, NTHU
A data scientist usually thinks data with a matrix representation. The concept
of vector algebra helps a lot for high dimensional data analysis.
URL: http://goo.gl/nQ2cE2
Difficulty   
Difficulty   
Stat Method: Advanced Methods
These three fields are core data analytics methods. You will find them
everywhere, like in econometrics, machine learning, and so on.
Regression Analysis, NTHU
URL: http://goo.gl/YQBAla
Difficulty    
Multivariate Analysis, NTHU
URL: http://goo.gl/934GKd
Difficulty    
Experimental Design, NTHU
URL: http://goo.gl/ED9HMr
Difficulty    
Data Mining: Illinois & Stanford MOOC
Data mining is the most powerful tools for business analytics. It can be
applied to user behavior data, questionnaire design, and financial market.
Data Mining, UIUC
The Data Mining Specialization teaches data mining techniques for both structured data
which conform to a clearly defined schema, and unstructured data which exist in the form of
natural language text.
URL: https://goo.gl/Tyzm6Z
Difficulty    
Mining Massive Dataset, Stanford
Introduce the participant to modern distributed file systems and MapReduce, including what
distinguishes good MapReduce algorithms from good algorithms in general. The rest of the
course is devoted to algorithms for extracting models and information from large datasets.
URL: https://goo.gl/NYyxy9
Difficulty     
Data Mining: Illinois & Stanford MOOC
Data mining is the most powerful tools for business analytics. It can be
applied to user behavior data, questionnaire design, and financial market.
Data Mining, UIUC
The Data Mining Specialization teaches data mining techniques for both structured data
which conform to a clearly defined schema, and unstructured data which exist in the form of
natural language text.
URL: https://goo.gl/Tyzm6Z
Difficulty    
Mining Massive Dataset, Stanford
Introduce the participant to modern distributed file systems and MapReduce, including what
distinguishes good MapReduce algorithms from good algorithms in general. The rest of the
course is devoted to algorithms for extracting models and information from large datasets.
URL: https://goo.gl/NYyxy9
Difficulty     
Machine Learning: Stnaford / NTU MOOC
Machine learning is the science of getting computers to act without being
explicitly programmed.
Machine Learning, Stanford
This course provides a broad introduction to machine learning, datamining,
and statistical pattern recognition.
URL: https://www.coursera.org/learn/machine-learning
Difficulty    
Machine Learning, NTU
The students shall enjoy a story-like flow moving from "When Can Machines
Learn" to "Why", "How" and beyond.. (Very tough course!)
URL: https://www.coursera.org/course/ntumlone
Difficulty     
3. What I’ve done
in practice!
How to become the data unicorn
without any tuition fee
SOP for Data Analytic Project
Data Task
Formulation
Data
Collection
Data
Cleaning
Data
Exploration
Data
Modeling
Define
Purpose
Model
Selection
Performance
Evaluation
Model
Deployment
Initial Phase
90% Efforts
Middle Phase
90% Professions
Final Phase
90% Domain
25,054,386 vc
Monthly View Counts
751,631,580 values
Lots of user behavior!
1,785,244 users
Monthly Active Users
My workspace
R, Google Analytics, Spark
Big Data
All about math, statistics, and coding.
But how about business knowledge?
thanks!
ANY
QUESTIONS?
You can find me at:
tawei.huang1@gmial.com

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Predictive data analytics models and their applications
Predictive data analytics models and their applicationsPredictive data analytics models and their applications
Predictive data analytics models and their applications
 
Step by Step guide to executing an analytics project
Step by Step guide to executing an analytics projectStep by Step guide to executing an analytics project
Step by Step guide to executing an analytics project
 
Barga Data Science lecture 4
Barga Data Science lecture 4Barga Data Science lecture 4
Barga Data Science lecture 4
 
Classification
ClassificationClassification
Classification
 
Barga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 KeynoteBarga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 Keynote
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
 
Exploring the Data science Process
Exploring the Data science ProcessExploring the Data science Process
Exploring the Data science Process
 
Predictive modeling
Predictive modelingPredictive modeling
Predictive modeling
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9
 
Scientific revenue unreasonable effectiveness of data
Scientific revenue unreasonable effectiveness of dataScientific revenue unreasonable effectiveness of data
Scientific revenue unreasonable effectiveness of data
 
7 steps to Predictive Analytics
7 steps to Predictive Analytics 7 steps to Predictive Analytics
7 steps to Predictive Analytics
 
01 deloitte predictive analytics analytics summit-09-30-14_092514
01   deloitte predictive analytics analytics summit-09-30-14_09251401   deloitte predictive analytics analytics summit-09-30-14_092514
01 deloitte predictive analytics analytics summit-09-30-14_092514
 
Statistics assignment help
Statistics assignment helpStatistics assignment help
Statistics assignment help
 
Hypothesis Testing: Spread (Compare 2+ Factors)
Hypothesis Testing: Spread (Compare 2+ Factors)Hypothesis Testing: Spread (Compare 2+ Factors)
Hypothesis Testing: Spread (Compare 2+ Factors)
 
1120 track2 bennett
1120 track2 bennett1120 track2 bennett
1120 track2 bennett
 
Data Visualization: Sales forecasting
Data Visualization: Sales forecastingData Visualization: Sales forecasting
Data Visualization: Sales forecasting
 
Hypothesis Testing: Finding the Right Statistical Test
Hypothesis Testing: Finding the Right Statistical TestHypothesis Testing: Finding the Right Statistical Test
Hypothesis Testing: Finding the Right Statistical Test
 
Barga Data Science lecture 5
Barga Data Science lecture 5Barga Data Science lecture 5
Barga Data Science lecture 5
 
Machine Learning in Healthcare: A Case Study
Machine Learning in Healthcare: A Case StudyMachine Learning in Healthcare: A Case Study
Machine Learning in Healthcare: A Case Study
 
Mining Credit Card Defults
Mining Credit Card DefultsMining Credit Card Defults
Mining Credit Card Defults
 

Similar a How to crack down big data?

Similar a How to crack down big data? (20)

Data+Science : A First Course
Data+Science : A First CourseData+Science : A First Course
Data+Science : A First Course
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)
 
Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)
 
Data science training in hydpdf converted (1)
Data science training in hydpdf  converted (1)Data science training in hydpdf  converted (1)
Data science training in hydpdf converted (1)
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabad
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
data science training and placement
data science training and placementdata science training and placement
data science training and placement
 
online data science training
online data science trainingonline data science training
online data science training
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 
data science online training in hyderabad
data science online training in hyderabaddata science online training in hyderabad
data science online training in hyderabad
 
Best data science training in Hyderabad
Best data science training in HyderabadBest data science training in Hyderabad
Best data science training in Hyderabad
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
 
Next generation of data scientist
Next generation of data scientistNext generation of data scientist
Next generation of data scientist
 

Último

207095666-Book-Review-on-Ignited-Minds-Final.pptx
207095666-Book-Review-on-Ignited-Minds-Final.pptx207095666-Book-Review-on-Ignited-Minds-Final.pptx
207095666-Book-Review-on-Ignited-Minds-Final.pptx
pawangadkhe786
 
Top profile Call Girls In chittoor [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In chittoor [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In chittoor [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In chittoor [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Cara Gugurkan Kandungan Awal Kehamilan 1 bulan (087776558899)
Cara Gugurkan Kandungan Awal Kehamilan 1 bulan (087776558899)Cara Gugurkan Kandungan Awal Kehamilan 1 bulan (087776558899)
Cara Gugurkan Kandungan Awal Kehamilan 1 bulan (087776558899)
Cara Menggugurkan Kandungan 087776558899
 
怎样办理哥伦比亚大学毕业证(Columbia毕业证书)成绩单学校原版复制
怎样办理哥伦比亚大学毕业证(Columbia毕业证书)成绩单学校原版复制怎样办理哥伦比亚大学毕业证(Columbia毕业证书)成绩单学校原版复制
怎样办理哥伦比亚大学毕业证(Columbia毕业证书)成绩单学校原版复制
yynod
 
Jual obat aborsi Jakarta ( 085657271886 )Cytote pil telat bulan penggugur kan...
Jual obat aborsi Jakarta ( 085657271886 )Cytote pil telat bulan penggugur kan...Jual obat aborsi Jakarta ( 085657271886 )Cytote pil telat bulan penggugur kan...
Jual obat aborsi Jakarta ( 085657271886 )Cytote pil telat bulan penggugur kan...
ZurliaSoop
 
Gabriel_Carter_EXPOLRATIONpp.pptx........
Gabriel_Carter_EXPOLRATIONpp.pptx........Gabriel_Carter_EXPOLRATIONpp.pptx........
Gabriel_Carter_EXPOLRATIONpp.pptx........
deejay178
 
Top profile Call Girls In Shivamogga [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Shivamogga [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Shivamogga [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Shivamogga [ 7014168258 ] Call Me For Genuine Model...
nirzagarg
 
Howrah [ Call Girls Kolkata ₹7.5k Pick Up & Drop With Cash Payment 8005736733...
Howrah [ Call Girls Kolkata ₹7.5k Pick Up & Drop With Cash Payment 8005736733...Howrah [ Call Girls Kolkata ₹7.5k Pick Up & Drop With Cash Payment 8005736733...
Howrah [ Call Girls Kolkata ₹7.5k Pick Up & Drop With Cash Payment 8005736733...
HyderabadDolls
 
Top profile Call Girls In Sagar [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Sagar [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Sagar [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Sagar [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
<DUBAI>Abortion pills IN UAE {{+971561686603*^Mifepristone & Misoprostol in D...
<DUBAI>Abortion pills IN UAE {{+971561686603*^Mifepristone & Misoprostol in D...<DUBAI>Abortion pills IN UAE {{+971561686603*^Mifepristone & Misoprostol in D...
<DUBAI>Abortion pills IN UAE {{+971561686603*^Mifepristone & Misoprostol in D...
gynedubai
 
Top profile Call Girls In Ratnagiri [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Ratnagiri [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Ratnagiri [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Ratnagiri [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Top profile Call Girls In Varanasi [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Varanasi [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Varanasi [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Varanasi [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 

Último (20)

207095666-Book-Review-on-Ignited-Minds-Final.pptx
207095666-Book-Review-on-Ignited-Minds-Final.pptx207095666-Book-Review-on-Ignited-Minds-Final.pptx
207095666-Book-Review-on-Ignited-Minds-Final.pptx
 
Joshua Minker Brand Exploration Sports Broadcaster .pptx
Joshua Minker Brand Exploration Sports Broadcaster .pptxJoshua Minker Brand Exploration Sports Broadcaster .pptx
Joshua Minker Brand Exploration Sports Broadcaster .pptx
 
Top profile Call Girls In chittoor [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In chittoor [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In chittoor [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In chittoor [ 7014168258 ] Call Me For Genuine Models ...
 
Kannada Call Girls Mira Bhayandar WhatsApp +91-9930687706, Best Service
Kannada Call Girls Mira Bhayandar WhatsApp +91-9930687706, Best ServiceKannada Call Girls Mira Bhayandar WhatsApp +91-9930687706, Best Service
Kannada Call Girls Mira Bhayandar WhatsApp +91-9930687706, Best Service
 
Personal Brand Exploration ppt.- Ronnie Jones
Personal Brand  Exploration ppt.- Ronnie JonesPersonal Brand  Exploration ppt.- Ronnie Jones
Personal Brand Exploration ppt.- Ronnie Jones
 
Cara Gugurkan Kandungan Awal Kehamilan 1 bulan (087776558899)
Cara Gugurkan Kandungan Awal Kehamilan 1 bulan (087776558899)Cara Gugurkan Kandungan Awal Kehamilan 1 bulan (087776558899)
Cara Gugurkan Kandungan Awal Kehamilan 1 bulan (087776558899)
 
怎样办理哥伦比亚大学毕业证(Columbia毕业证书)成绩单学校原版复制
怎样办理哥伦比亚大学毕业证(Columbia毕业证书)成绩单学校原版复制怎样办理哥伦比亚大学毕业证(Columbia毕业证书)成绩单学校原版复制
怎样办理哥伦比亚大学毕业证(Columbia毕业证书)成绩单学校原版复制
 
Jual obat aborsi Jakarta ( 085657271886 )Cytote pil telat bulan penggugur kan...
Jual obat aborsi Jakarta ( 085657271886 )Cytote pil telat bulan penggugur kan...Jual obat aborsi Jakarta ( 085657271886 )Cytote pil telat bulan penggugur kan...
Jual obat aborsi Jakarta ( 085657271886 )Cytote pil telat bulan penggugur kan...
 
Gabriel_Carter_EXPOLRATIONpp.pptx........
Gabriel_Carter_EXPOLRATIONpp.pptx........Gabriel_Carter_EXPOLRATIONpp.pptx........
Gabriel_Carter_EXPOLRATIONpp.pptx........
 
Brand Analysis for reggaeton artist Jahzel.
Brand Analysis for reggaeton artist Jahzel.Brand Analysis for reggaeton artist Jahzel.
Brand Analysis for reggaeton artist Jahzel.
 
Dating Call Girls inTiruvallur { 9332606886 } VVIP NISHA Call Girls Near 5 St...
Dating Call Girls inTiruvallur { 9332606886 } VVIP NISHA Call Girls Near 5 St...Dating Call Girls inTiruvallur { 9332606886 } VVIP NISHA Call Girls Near 5 St...
Dating Call Girls inTiruvallur { 9332606886 } VVIP NISHA Call Girls Near 5 St...
 
Vip Malegaon Escorts Service Girl ^ 9332606886, WhatsApp Anytime Malegaon
Vip Malegaon Escorts Service Girl ^ 9332606886, WhatsApp Anytime MalegaonVip Malegaon Escorts Service Girl ^ 9332606886, WhatsApp Anytime Malegaon
Vip Malegaon Escorts Service Girl ^ 9332606886, WhatsApp Anytime Malegaon
 
Top profile Call Girls In Shivamogga [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Shivamogga [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Shivamogga [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Shivamogga [ 7014168258 ] Call Me For Genuine Model...
 
7737669865 Call Girls In Ahmedabad Escort Service Available 24×7 In In Ahmedabad
7737669865 Call Girls In Ahmedabad Escort Service Available 24×7 In In Ahmedabad7737669865 Call Girls In Ahmedabad Escort Service Available 24×7 In In Ahmedabad
7737669865 Call Girls In Ahmedabad Escort Service Available 24×7 In In Ahmedabad
 
Howrah [ Call Girls Kolkata ₹7.5k Pick Up & Drop With Cash Payment 8005736733...
Howrah [ Call Girls Kolkata ₹7.5k Pick Up & Drop With Cash Payment 8005736733...Howrah [ Call Girls Kolkata ₹7.5k Pick Up & Drop With Cash Payment 8005736733...
Howrah [ Call Girls Kolkata ₹7.5k Pick Up & Drop With Cash Payment 8005736733...
 
Top profile Call Girls In Sagar [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Sagar [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Sagar [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Sagar [ 7014168258 ] Call Me For Genuine Models We ...
 
<DUBAI>Abortion pills IN UAE {{+971561686603*^Mifepristone & Misoprostol in D...
<DUBAI>Abortion pills IN UAE {{+971561686603*^Mifepristone & Misoprostol in D...<DUBAI>Abortion pills IN UAE {{+971561686603*^Mifepristone & Misoprostol in D...
<DUBAI>Abortion pills IN UAE {{+971561686603*^Mifepristone & Misoprostol in D...
 
Top profile Call Girls In Ratnagiri [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Ratnagiri [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Ratnagiri [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Ratnagiri [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Varanasi [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Varanasi [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Varanasi [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Varanasi [ 7014168258 ] Call Me For Genuine Models ...
 
UIowa Application Instructions - 2024 Update
UIowa Application Instructions - 2024 UpdateUIowa Application Instructions - 2024 Update
UIowa Application Instructions - 2024 Update
 

How to crack down big data?

  • 1. How to crack down BIG DATA? Required Skills for Data Scientist
  • 2. hello! I AM DAVID HUANG I am here because I want to find more lovers for data science . You can find me at: tawei.huang1@gmail,com My Experience • Data Scientist Intern, Yoctol • Data & Strategy Intern , Chocolabs • Summer Intern Student, Institute of Mathematics, Academic Sinica My Education Background • Master in Statistics, NTU • BSc. In Quantitative Finance, NTHU • Research Student, PKU
  • 3. “Big data is a big trend, but it is very difficult to hire a data scietist. It’s also hard to find a job in TW XD “
  • 4. 1. Who is a Data Scientist? The skill sets you need to be a data scientist.
  • 5. In a big data project, we need these people! Data Backend Engineer Database Architect Data Analyst / Data Scientist Domain Expert Develop and operate backend systems related to data access, collection, processing and storage, Architect and design Database solutions for the enterprise, and lead the effort on database performance and optimization To use advanced quantitative analysis, data mining techniques and strong industry acumen to interpret, connect and predict data to deliver insight and recommendations for decisions. Assist the data team to understand the domain problem & knowledge.
  • 6. Data analyst / Machine Leaning Lots of people say that they are different, but I think “every data analyst should be a data scientist, and the converse holds!” Explanatory Analytics  Theory-based, statistical testing of causal hypothesis (commonly see in economics)  Strength of relationship in statistical model Data analyst Predictive Analytics  Empirical method for predicting new observations (in statistical / math / CS ways)  Ability to accurately predict new individuals Data scientist Both fields are important for discovering knowledge.
  • 7. Data Unicorn A data unicorn expertises in all fields… Mission impossible?
  • 8. The Data Scientist Venn Diagram Math & Statistics Hacking Skills Domain Expertise Machine Learning ResearchProgram Unicorn First become a (1) researcher, (2) machine learner, (3) programmer, and then find your own way to be a data scientist.
  • 9. Skill Sets for Data Scientist – Math & Stat Mathematics & Statistics Multivariable Calculus Linear Algebra Probability Theory Statistics / Math Statistics Convex Optimization Discrete Analysis Basic Knowledge Regression Analysis / GLM Experimental Design Causal Inference Multivariate Analysis Biz Analytics & Data Mining Data Mining Machine Learning Deep Learning (ANN/CNN) Machine Learning Time Series Analysis Forecasting 1
  • 10. Skill Sets for Data Scientist – Programming Programming Skills Python (Scripting Language) R (Statistical Software) Matlab (Super Fast but Expensive) Programming Skill SQL & Relational Algebra NoSQL / Cassandra / etc. HDFS / Map Reduce Hadoop and Hive /Pig Spark & Scala Database Querying A little bit Java Data Structure & Algorithm Data Munging (python!) Data Viz (d3.js / Tableau) Software Engineering 2 1. D3.js visualization: http://goo.gl/cVlTX7 2. Spark MiLib: http://goo.gl/VNMQ97
  • 11. Skill Sets for Data Scientist – Business Sense Business Professionalism Hypothesis Thinking Pyramid Principles BizPro is a good choice! Logical Thinking To be honest, the crucial truth is that “this part is very important, but the less important skill set!” Presentation & Presence Communication Skill Upward Management Communication Skill I think this is the niche for business school students. Specific knowledge about marketing, financial analysis, etc. helps a lot. 3
  • 12. My Learning Path for you – Math matters! Calculus Linear Algebra Probability Theory Math Statistics Freshman - Junior 1 Programming C / Java / R 2 Financial Market Marketing Management 3 Advance Statistics Data Mining Econometrics Senior R Programming Matlab (Basic) Competitions Advanced Finance Macroeconomics Statistical Learning Compress Sensing Current Python & SQL Hadoop & Spark BizPro Training Logical Thinking Marketing Analytic
  • 13. 2. Master in Data Science free! How to become the data unicorn without any tuition fee
  • 14. Data Scientist 101: Johns Hopkins MOOC The Coursera Specializations offered by Johns Hopkins University give a very good general exposure to the world of data science. Executive Data Science I think this specialization is designed for those who don’t want to become a data scientist but may work in a data-driven company. URL: https://goo.gl/ZNBF7N Data Science I think this specialization is designed for those who don’t have a very strong academic background but want to become a data scientist. URL: https://goo.gl/8OzBhe Difficulty  Difficulty   
  • 15. Basic Math: Calculus & Linear Algebra Calculus and linear algebra are fundamental tools for data scientists and statisticians. Having a solid foundation will help a lot. Calculus I & II, NTHU This course gives you a solid foundation of Euclidean space and multivariable calculus, which is very important for a data scientist. URL: http://ocw.nthu.edu.tw/ocw/index.php?page=course&cid=7& Linear Algebra, NCTU A data scientist usually thinks data with a matrix representation. The concept of vector algebra helps a lot for high dimensional data analysis. URL: http://goo.gl/KFdJTT Difficulty   Difficulty  
  • 16. Advance Math: Convex Optimization This is a very advanced topic we will use when doing machine learning. However, I don’t think every data scientist should understand this field. Convex Optimization, Stanford This course should benefit anyone who uses or will use scientific computing or optimization in engineering or related work (e.g., machine learning, finance, operational research). URL: http://stanford.edu/class/ee364a/ MOOC: https://goo.gl/KBQ473 Difficulty     
  • 17. Basic Stat: Probability & Math Statistics If you don‘t have a probability & math statistics, you can’t learn any advanced data analytics method. Please learn it! Probability, NTHU This course gives you a solid foundation of Euclidean space and multivariable calculus, which is very important for a data scientist. URL: http://goo.gl/G4MhIj Math Statistics, NTHU A data scientist usually thinks data with a matrix representation. The concept of vector algebra helps a lot for high dimensional data analysis. URL: http://goo.gl/nQ2cE2 Difficulty    Difficulty   
  • 18. Stat Method: Advanced Methods These three fields are core data analytics methods. You will find them everywhere, like in econometrics, machine learning, and so on. Regression Analysis, NTHU URL: http://goo.gl/YQBAla Difficulty     Multivariate Analysis, NTHU URL: http://goo.gl/934GKd Difficulty     Experimental Design, NTHU URL: http://goo.gl/ED9HMr Difficulty    
  • 19. Data Mining: Illinois & Stanford MOOC Data mining is the most powerful tools for business analytics. It can be applied to user behavior data, questionnaire design, and financial market. Data Mining, UIUC The Data Mining Specialization teaches data mining techniques for both structured data which conform to a clearly defined schema, and unstructured data which exist in the form of natural language text. URL: https://goo.gl/Tyzm6Z Difficulty     Mining Massive Dataset, Stanford Introduce the participant to modern distributed file systems and MapReduce, including what distinguishes good MapReduce algorithms from good algorithms in general. The rest of the course is devoted to algorithms for extracting models and information from large datasets. URL: https://goo.gl/NYyxy9 Difficulty     
  • 20. Data Mining: Illinois & Stanford MOOC Data mining is the most powerful tools for business analytics. It can be applied to user behavior data, questionnaire design, and financial market. Data Mining, UIUC The Data Mining Specialization teaches data mining techniques for both structured data which conform to a clearly defined schema, and unstructured data which exist in the form of natural language text. URL: https://goo.gl/Tyzm6Z Difficulty     Mining Massive Dataset, Stanford Introduce the participant to modern distributed file systems and MapReduce, including what distinguishes good MapReduce algorithms from good algorithms in general. The rest of the course is devoted to algorithms for extracting models and information from large datasets. URL: https://goo.gl/NYyxy9 Difficulty     
  • 21. Machine Learning: Stnaford / NTU MOOC Machine learning is the science of getting computers to act without being explicitly programmed. Machine Learning, Stanford This course provides a broad introduction to machine learning, datamining, and statistical pattern recognition. URL: https://www.coursera.org/learn/machine-learning Difficulty     Machine Learning, NTU The students shall enjoy a story-like flow moving from "When Can Machines Learn" to "Why", "How" and beyond.. (Very tough course!) URL: https://www.coursera.org/course/ntumlone Difficulty     
  • 22. 3. What I’ve done in practice! How to become the data unicorn without any tuition fee
  • 23. SOP for Data Analytic Project Data Task Formulation Data Collection Data Cleaning Data Exploration Data Modeling Define Purpose Model Selection Performance Evaluation Model Deployment Initial Phase 90% Efforts Middle Phase 90% Professions Final Phase 90% Domain
  • 24. 25,054,386 vc Monthly View Counts 751,631,580 values Lots of user behavior! 1,785,244 users Monthly Active Users
  • 25. My workspace R, Google Analytics, Spark
  • 26. Big Data All about math, statistics, and coding. But how about business knowledge?
  • 27. thanks! ANY QUESTIONS? You can find me at: tawei.huang1@gmial.com