SlideShare una empresa de Scribd logo
1 de 14
Building Data Scientists
Machine Learning Mastery in Python
Mitch Sanders
Jan 10th 2018
Internal Use - Confidential
2 of Y
Internal Use - Confidential
Trend #2 Non-Data Scientists will perform
more fairly sophisticated analytics
alongside data scientists
Data Scientist
Algorithm Coder
Data
Science
Citizens
Advanced
Analytics
Programmers
Statisticians
Business
Analyst
Coders
Data Science continues to develop
specialties - this means the mythical
‘full stack’ data scientist will disappear
Trend #1
Data
Scientist
Data
Engineer
Algorithm
Coder
Data
Storyteller
Industry Trends for 2018 – How
what we’re doing fits into the future
the Context
3 of Y
Internal Use - Confidential
the Course
Machine Learning Mastery
- Understand Your Data
- Create Accurate Models
- Work Projects End-To-End
• 16 weeks – May-Oct., 2017
• 20+ class hours – 20% homework, 80% live coding
• 17 notebooks – Python code templates
• 4 Prerequisites – Coding, statistics, algorithms, thirst to learn
• 1 Textbook – Machine Learning Mastery w/ Python -Dr. Jason Brownlee
• 1 Teacher – Mitch Sanders w/ Assistant – Uday Waghmare
• 14 Students – global: software engineers, adv. analysts, statisticians
• Platform – Jupyter, Python 2.7, Anaconda
• Code Repository – GitHub
• NPS Survey – Survey Monkey, LTR = 90
• Awarded – “On the Spot”
4 of Y
Internal Use - Confidential
the Content
Prepare & Explore Model Improve Accuracy & Finalize
Python ML
Ecosystem
SciPy
Scikit-learn
Crash Courses
NumPy
Matplotlib
Pandas
Load Libraries & Data
Descriptive Statistics
Attribute Data Types
Class Distribution
Correlation Analysis
Skew of Univariates
Pre Processing
Rescale
Standardize
Normalize
BinarizeFeature Selection
Tree & Univariate
Recursive -RFE
Principle Comp.
Analysis - PCA
Feature Importance
Resampling
Split into Train/Test
K-fold Cross Validation
Leave One Out
Repeated Random
Evaluation Metrics
For Classification
For Regression
Spot Check
Classification Algorithms
Linear –
• Logistic Regression
• Linear Discriminate
Analysis (LDA)
Non-linear –
• K-Nearest Neighbor (KNN)
• Naïve Bayes
• Class & Regression Trees
(CART)
• Support Vector Machines
(SVM)
Compare Algorithms
Spot Check
Regression Algorithms
Linear – LR, LASSO,
ElasticNet (EN)
Non-Linear – CART, SVR,
KNN
Automate w/ Pipelines
Preparation Pipelines
Feature Extraction Pipelines
Modeling Pipelines
Ensembles - Performance
Improvements
Boosting –
• AdaBoost,
• Gradient Boosting (GBM)
Bagging –
• Random Forest, Extra Trees
• Voting
Algorithm
Parameter Tuning
Parameters
Grid Search
Random Search
Finalize Model
Predict on Validation Data
Create Standalone on Entire Data
Save Model for Production
Visualization
Univariate Plots
Multivariate Plots
Case Studies #1 & #2
Key concepts – and flow – the
17 notebooks
#1
#17
Reference Material
6 of Y
Internal Use - Confidential
the Course Syllabus
Python Ecosystem for Machine
Learning
• Python
• SciPy
• Scikit-learn
• Python Ecosystem Installation
• Summary
Crash Course in Python and SciPy
• Python Crash Course
• NumPy Crash Course
• Matplotlib Crash Course
• Pandas Crash Course
• Summary
How To Load Machine Learning Data
• Considerations When Loading CSV
Data
• Pima Indians Dataset
• Load CSV Files with the Python
Standard Library
• Load CSV Files with NumPy
• Load CSV Files with Pandas
• Summary
Understand Your Data With
Visualization
• Univariate Plots
• Multivariate Plots
• Summary
Prepare Your Data For Machine Learning
• Need For Data Pre-processing
• Data Transforms
• Rescale Data
• Standardize Data
• Normalize Data
• Binarize Data (Make Binary)
• Summary
Feature Selection For Machine Learning
• Feature Selection
• Univariate Selection
• Recursive Feature Elimination
• Principal Component Analysis
• Feature Importance
• Summary
Evaluate the Performance of Machine
Learning Algorithms with Resampling
• Evaluate Machine Learning Algorithms
• Split into Train and Test Sets
• K-fold Cross-Validation
• Leave One Out Cross-Validation
• Repeated Random Test-Train Splits
• What Techniques to Use When
• Summary
Machine Learning Algorithm
Performance Metrics
• Algorithm Evaluation Metrics
• Classification Metrics
• Regression Metrics
• Summary
Spot-Check Classification Algorithms
• Algorithm Spot-Checking
• Algorithms Overview
• Linear Machine Learning Algorithms
• Nonlinear Machine Learning
Algorithms
• Summary
Spot-Check Regression Algorithms
• Algorithms Overview
• Linear Machine Learning Algorithms
• Nonlinear Machine Learning
Algorithms
• Summary
Compare Machine Learning Algorithms
• Choose The Best Machine Learning
Model
• Compare Machine Learning
Algorithms Consistently
• Summary
Automate Machine Learning Workflows
with Pipelines
• Automating Machine Learning
Workflows
• Data Preparation and Modeling
Pipeline
• Feature Extraction and Modeling
Pipeline
• Summary
Improve Performance with Ensembles
• Combine Models Into Ensemble
Predictions
• Bagging Algorithms
• Boosting Algorithms
• Voting Ensemble
• Summary
7 of Y
Internal Use - Confidential
data science student questions - 1
“So you do Data Science work. What really does that involve? And how is that different than programming, statistical work or data
engineering?”
“I want to learn Data Science. Between R, Python and SAS, where should I start and what are the Pros and Cons of each?”
“What is OOP (Object orientated programming) and Structured Programming and what’s the difference between them?"
“What is main differences between Python 2.7 and Python 3.x versions? And why do so many developers stay with Python 2.7?”
"What is the difference between Supervised Learning an Unsupervised Learning?"
"What's different graphing might a univariate have compared to a bivariate analysis? Can you graph multivariate?"
"How do you explain machine learning to an 8-year old child?"
"What is Gradient Descent?
"What is multicollinearity and how you can overcome it?"
8 of Y
Internal Use - Confidential
data science student questions - 2
"What is the curse of dimensionality?"
"What do you understand by Hypothesis in the content of Machine Learning?"
"What's the difference between a Test Set and a Validation Set?"
"What is cross-validation and what is it used for?"
"What's difference between a Classification Regression Tree algoithm and a Random Forest? And when is one better than the other?"
"What are the basic assumptions to be made for linear regression?"
"Can you explain in simple language what is an Eigenvalue and Eigenvector?"
"Do gradient descent methods always converge to same point?"
"What's difference between continuous, ordinal and categorical variables?"
"What is K-means? How can you select K for K-means?"
9 of Y
Internal Use - Confidential
data science student questions - 3
"Why is naive Bayes so ‘naive’ ?"
"OLS is to linear regression as Maximum likelihood is to logistic regression. Explain the statement."
"What do you understand by Bias Variance trade off?"
"Do you suggest that treating a categorical variable as continuous variable would result in a better predictive model?"
"When does regularization becomes necessary in Machine Learning?"
"Explain a model and its dimensions to an 8 year old."
"How do you determine and deal with correlated features in your data set, how to reduce the dimensionality of data?"
"During analysis, how do you treat missing values?"
"What is Regularization and what kind of problems does regularization solve?"
Extras
11 of Y
Internal Use - Confidential
the Data Scientist Roles
Roles Defined by 3 different Data Science Authors
Data Scientist Core Skills
How To Build A Successful Data Science
Team
The seven people you need on your
Big Data team Descriptions:
Capture Data Engineer Handyman
Expert in Dell EDW, D3, BO, Hana/BMS,
other RDBMS, and ETL work
Open Source Guru (plus Data
Modeler)
Hadoop stack, Cloudera, Linux, data
structures and network
Analyze Machine Learning Expert
Data Modeler (plus all aspets of Data
Engineer and Business Analyst)
SQL, RDBMS, Teradata, Dell
infrastructure
Deep Diver
Machine Learning, R, Python, SQL, ETL
work, algorithm modeling, statistics
Present Business Analyst Story Teller
PowerPoint, Design, Tableau,
understands customers business
language and technical, artistic eye
Snoop (plus Handyman skills)
Enthusiastic, deeply creative, super savy
in Dell envirionments, finds contacts and
not hesitant to do work-arounds
Privacy Wonk
Dell policy meticulous, socially aware,
foresees roadblocks
12 of Y
Internal Use - Confidential
13 of Y
Internal Use - Confidential
14 of Y
Internal Use - Confidential

Más contenido relacionado

La actualidad más candente

Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...Simplilearn
 
Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data ScienceSpotle.ai
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientistryanorban
 
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Ilkay Altintas, Ph.D.
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxShanmugasundaram M
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningLars Marius Garshol
 
Data+Science : A First Course
Data+Science : A First CourseData+Science : A First Course
Data+Science : A First CourseArnab Majumdar
 
How To Become a Data Scientist in Iran Marketplace
How To Become a Data Scientist in Iran Marketplace How To Become a Data Scientist in Iran Marketplace
How To Become a Data Scientist in Iran Marketplace Mohamadreza Mohtat
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceANOOP V S
 
Introduction to Machine Learning & AI
Introduction to Machine Learning & AIIntroduction to Machine Learning & AI
Introduction to Machine Learning & AIMichael Eydman
 
Data science
Data scienceData science
Data science9diov
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceGabriel Moreira
 
General introduction to AI ML DL DS
General introduction to AI ML DL DSGeneral introduction to AI ML DL DS
General introduction to AI ML DL DSRoopesh Kohad
 
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...Simplilearn
 

La actualidad más candente (20)

Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
 
Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data Science
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
 
Data Science using Python
Data Science using PythonData Science using Python
Data Science using Python
 
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
Data+Science : A First Course
Data+Science : A First CourseData+Science : A First Course
Data+Science : A First Course
 
Unit 3 part 2
Unit  3 part 2Unit  3 part 2
Unit 3 part 2
 
How To Become a Data Scientist in Iran Marketplace
How To Become a Data Scientist in Iran Marketplace How To Become a Data Scientist in Iran Marketplace
How To Become a Data Scientist in Iran Marketplace
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
NLP & ML Webinar
NLP & ML WebinarNLP & ML Webinar
NLP & ML Webinar
 
Introduction to Machine Learning & AI
Introduction to Machine Learning & AIIntroduction to Machine Learning & AI
Introduction to Machine Learning & AI
 
Introduction to Data Science by Datalent Team @Data Science Clinic #9
Introduction to Data Science by Datalent Team @Data Science Clinic #9Introduction to Data Science by Datalent Team @Data Science Clinic #9
Introduction to Data Science by Datalent Team @Data Science Clinic #9
 
Data science
Data scienceData science
Data science
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
General introduction to AI ML DL DS
General introduction to AI ML DL DSGeneral introduction to AI ML DL DS
General introduction to AI ML DL DS
 
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
 
Data Science in Action
Data Science in ActionData Science in Action
Data Science in Action
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 

Similar a Building Data Scientists in Python

JavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data ScienceJavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data ScienceMark West
 
Abhishek Training PPT.pptx
Abhishek Training PPT.pptxAbhishek Training PPT.pptx
Abhishek Training PPT.pptxKashishKashish22
 
NDC Oslo : A Practical Introduction to Data Science
NDC Oslo : A Practical Introduction to Data ScienceNDC Oslo : A Practical Introduction to Data Science
NDC Oslo : A Practical Introduction to Data ScienceMark West
 
Data Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptxData Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptxsumitkumar600840
 
Data Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdfData Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdfRAKESHG79
 
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceGeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceMark West
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceMark West
 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchRachel Berryman
 
How to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesHow to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesUpXAcademy
 
L2 DS Tools and Application.pptx
L2 DS Tools and Application.pptxL2 DS Tools and Application.pptx
L2 DS Tools and Application.pptxShambhavi Vats
 
intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...jybufgofasfbkpoovh
 
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Rohit Dubey
 
A New Paradigm on Analytic-Driven Information and Automation V2.pdf
A New Paradigm on Analytic-Driven Information and Automation V2.pdfA New Paradigm on Analytic-Driven Information and Automation V2.pdf
A New Paradigm on Analytic-Driven Information and Automation V2.pdfArmyTrilidiaDevegaSK
 
Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachMihai Criveti
 
Introduction to Data Science.pdf
Introduction to Data Science.pdfIntroduction to Data Science.pdf
Introduction to Data Science.pdfUniversity of Sindh
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?DIGITALSAI1
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification courseKumarNaik21
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)SayyedYusufali
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabadVamsiNihal
 

Similar a Building Data Scientists in Python (20)

JavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data ScienceJavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data Science
 
Abhishek Training PPT.pptx
Abhishek Training PPT.pptxAbhishek Training PPT.pptx
Abhishek Training PPT.pptx
 
NDC Oslo : A Practical Introduction to Data Science
NDC Oslo : A Practical Introduction to Data ScienceNDC Oslo : A Practical Introduction to Data Science
NDC Oslo : A Practical Introduction to Data Science
 
00-01 DSnDA.pdf
00-01 DSnDA.pdf00-01 DSnDA.pdf
00-01 DSnDA.pdf
 
Data Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptxData Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptx
 
Data Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdfData Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdf
 
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceGeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data Science
 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the Switch
 
How to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesHow to crack Big Data and Data Science roles
How to crack Big Data and Data Science roles
 
L2 DS Tools and Application.pptx
L2 DS Tools and Application.pptxL2 DS Tools and Application.pptx
L2 DS Tools and Application.pptx
 
intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...
 
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
 
A New Paradigm on Analytic-Driven Information and Automation V2.pdf
A New Paradigm on Analytic-Driven Information and Automation V2.pdfA New Paradigm on Analytic-Driven Information and Automation V2.pdf
A New Paradigm on Analytic-Driven Information and Automation V2.pdf
 
Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps Approach
 
Introduction to Data Science.pdf
Introduction to Data Science.pdfIntroduction to Data Science.pdf
Introduction to Data Science.pdf
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
 

Último

UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 

Último (20)

UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 

Building Data Scientists in Python

  • 1. Building Data Scientists Machine Learning Mastery in Python Mitch Sanders Jan 10th 2018 Internal Use - Confidential
  • 2. 2 of Y Internal Use - Confidential Trend #2 Non-Data Scientists will perform more fairly sophisticated analytics alongside data scientists Data Scientist Algorithm Coder Data Science Citizens Advanced Analytics Programmers Statisticians Business Analyst Coders Data Science continues to develop specialties - this means the mythical ‘full stack’ data scientist will disappear Trend #1 Data Scientist Data Engineer Algorithm Coder Data Storyteller Industry Trends for 2018 – How what we’re doing fits into the future the Context
  • 3. 3 of Y Internal Use - Confidential the Course Machine Learning Mastery - Understand Your Data - Create Accurate Models - Work Projects End-To-End • 16 weeks – May-Oct., 2017 • 20+ class hours – 20% homework, 80% live coding • 17 notebooks – Python code templates • 4 Prerequisites – Coding, statistics, algorithms, thirst to learn • 1 Textbook – Machine Learning Mastery w/ Python -Dr. Jason Brownlee • 1 Teacher – Mitch Sanders w/ Assistant – Uday Waghmare • 14 Students – global: software engineers, adv. analysts, statisticians • Platform – Jupyter, Python 2.7, Anaconda • Code Repository – GitHub • NPS Survey – Survey Monkey, LTR = 90 • Awarded – “On the Spot”
  • 4. 4 of Y Internal Use - Confidential the Content Prepare & Explore Model Improve Accuracy & Finalize Python ML Ecosystem SciPy Scikit-learn Crash Courses NumPy Matplotlib Pandas Load Libraries & Data Descriptive Statistics Attribute Data Types Class Distribution Correlation Analysis Skew of Univariates Pre Processing Rescale Standardize Normalize BinarizeFeature Selection Tree & Univariate Recursive -RFE Principle Comp. Analysis - PCA Feature Importance Resampling Split into Train/Test K-fold Cross Validation Leave One Out Repeated Random Evaluation Metrics For Classification For Regression Spot Check Classification Algorithms Linear – • Logistic Regression • Linear Discriminate Analysis (LDA) Non-linear – • K-Nearest Neighbor (KNN) • Naïve Bayes • Class & Regression Trees (CART) • Support Vector Machines (SVM) Compare Algorithms Spot Check Regression Algorithms Linear – LR, LASSO, ElasticNet (EN) Non-Linear – CART, SVR, KNN Automate w/ Pipelines Preparation Pipelines Feature Extraction Pipelines Modeling Pipelines Ensembles - Performance Improvements Boosting – • AdaBoost, • Gradient Boosting (GBM) Bagging – • Random Forest, Extra Trees • Voting Algorithm Parameter Tuning Parameters Grid Search Random Search Finalize Model Predict on Validation Data Create Standalone on Entire Data Save Model for Production Visualization Univariate Plots Multivariate Plots Case Studies #1 & #2 Key concepts – and flow – the 17 notebooks #1 #17
  • 6. 6 of Y Internal Use - Confidential the Course Syllabus Python Ecosystem for Machine Learning • Python • SciPy • Scikit-learn • Python Ecosystem Installation • Summary Crash Course in Python and SciPy • Python Crash Course • NumPy Crash Course • Matplotlib Crash Course • Pandas Crash Course • Summary How To Load Machine Learning Data • Considerations When Loading CSV Data • Pima Indians Dataset • Load CSV Files with the Python Standard Library • Load CSV Files with NumPy • Load CSV Files with Pandas • Summary Understand Your Data With Visualization • Univariate Plots • Multivariate Plots • Summary Prepare Your Data For Machine Learning • Need For Data Pre-processing • Data Transforms • Rescale Data • Standardize Data • Normalize Data • Binarize Data (Make Binary) • Summary Feature Selection For Machine Learning • Feature Selection • Univariate Selection • Recursive Feature Elimination • Principal Component Analysis • Feature Importance • Summary Evaluate the Performance of Machine Learning Algorithms with Resampling • Evaluate Machine Learning Algorithms • Split into Train and Test Sets • K-fold Cross-Validation • Leave One Out Cross-Validation • Repeated Random Test-Train Splits • What Techniques to Use When • Summary Machine Learning Algorithm Performance Metrics • Algorithm Evaluation Metrics • Classification Metrics • Regression Metrics • Summary Spot-Check Classification Algorithms • Algorithm Spot-Checking • Algorithms Overview • Linear Machine Learning Algorithms • Nonlinear Machine Learning Algorithms • Summary Spot-Check Regression Algorithms • Algorithms Overview • Linear Machine Learning Algorithms • Nonlinear Machine Learning Algorithms • Summary Compare Machine Learning Algorithms • Choose The Best Machine Learning Model • Compare Machine Learning Algorithms Consistently • Summary Automate Machine Learning Workflows with Pipelines • Automating Machine Learning Workflows • Data Preparation and Modeling Pipeline • Feature Extraction and Modeling Pipeline • Summary Improve Performance with Ensembles • Combine Models Into Ensemble Predictions • Bagging Algorithms • Boosting Algorithms • Voting Ensemble • Summary
  • 7. 7 of Y Internal Use - Confidential data science student questions - 1 “So you do Data Science work. What really does that involve? And how is that different than programming, statistical work or data engineering?” “I want to learn Data Science. Between R, Python and SAS, where should I start and what are the Pros and Cons of each?” “What is OOP (Object orientated programming) and Structured Programming and what’s the difference between them?" “What is main differences between Python 2.7 and Python 3.x versions? And why do so many developers stay with Python 2.7?” "What is the difference between Supervised Learning an Unsupervised Learning?" "What's different graphing might a univariate have compared to a bivariate analysis? Can you graph multivariate?" "How do you explain machine learning to an 8-year old child?" "What is Gradient Descent? "What is multicollinearity and how you can overcome it?"
  • 8. 8 of Y Internal Use - Confidential data science student questions - 2 "What is the curse of dimensionality?" "What do you understand by Hypothesis in the content of Machine Learning?" "What's the difference between a Test Set and a Validation Set?" "What is cross-validation and what is it used for?" "What's difference between a Classification Regression Tree algoithm and a Random Forest? And when is one better than the other?" "What are the basic assumptions to be made for linear regression?" "Can you explain in simple language what is an Eigenvalue and Eigenvector?" "Do gradient descent methods always converge to same point?" "What's difference between continuous, ordinal and categorical variables?" "What is K-means? How can you select K for K-means?"
  • 9. 9 of Y Internal Use - Confidential data science student questions - 3 "Why is naive Bayes so ‘naive’ ?" "OLS is to linear regression as Maximum likelihood is to logistic regression. Explain the statement." "What do you understand by Bias Variance trade off?" "Do you suggest that treating a categorical variable as continuous variable would result in a better predictive model?" "When does regularization becomes necessary in Machine Learning?" "Explain a model and its dimensions to an 8 year old." "How do you determine and deal with correlated features in your data set, how to reduce the dimensionality of data?" "During analysis, how do you treat missing values?" "What is Regularization and what kind of problems does regularization solve?"
  • 11. 11 of Y Internal Use - Confidential the Data Scientist Roles Roles Defined by 3 different Data Science Authors Data Scientist Core Skills How To Build A Successful Data Science Team The seven people you need on your Big Data team Descriptions: Capture Data Engineer Handyman Expert in Dell EDW, D3, BO, Hana/BMS, other RDBMS, and ETL work Open Source Guru (plus Data Modeler) Hadoop stack, Cloudera, Linux, data structures and network Analyze Machine Learning Expert Data Modeler (plus all aspets of Data Engineer and Business Analyst) SQL, RDBMS, Teradata, Dell infrastructure Deep Diver Machine Learning, R, Python, SQL, ETL work, algorithm modeling, statistics Present Business Analyst Story Teller PowerPoint, Design, Tableau, understands customers business language and technical, artistic eye Snoop (plus Handyman skills) Enthusiastic, deeply creative, super savy in Dell envirionments, finds contacts and not hesitant to do work-arounds Privacy Wonk Dell policy meticulous, socially aware, foresees roadblocks
  • 12. 12 of Y Internal Use - Confidential
  • 13. 13 of Y Internal Use - Confidential
  • 14. 14 of Y Internal Use - Confidential

Notas del editor

  1. https://www.datasciencecentral.com/profiles/blogs/6-predictions-about-data-science-machine-learning-and-ai-for-2018