SlideShare una empresa de Scribd logo
1 de 42
Descargar para leer sin conexión
Practical Predictive
Modeling in Python
Robert Dempsey
robertwdempsey.com
Robert Dempsey
robertwdempsey
rdempsey
rdempsey
robertwdempsey.com
pythonbicookbook.com
Doing All Things In SQL
Makes Panda sad and confused
Each New Thing You Learn
Leads to another new thing to learn, and another, and…
So Many Things
1. Which predictive modeling technique to use
2. How to get the data into a format for modeling
3. How to ensure the “right” data is being used
4. How to feed the data into the model
5. How to validate the model results
6. How to save the model to use in production
7. How to implement the model in production and apply it to new observations
8. How to save the new predictions
9. How to ensure, over time, that the model is correctly predicting outcomes
10.How to later update the model with new training data
Choose Your Model
Model Selection
• How much data do you have?
• Are you predicting a category? A quantity?
• Do you have labeled data?
• Do you know the number of categories?
• How much data do you have?
Regression
• Used for estimating the relationships among
variables
• Use when:
• Predicting a quantity
• More than 50 samples
Classification
• Used to answer “what is this object”
• Use when:
• Predicting a category
• Have labeled data
Clustering
• Used to group similar objects
• Use when:
• Predicting a category
• Don’t have labeled data
• Number of categories is known or unknown
• Have more than 50 samples
Dimensionality Reduction
• Process for reducing the number of random
variables under consideration (feature selection
and feature extraction)
• Use when:
• Not predicting a category or a quantity
• Just looking around
Model Selection
http://scikit-learn.org/stable/tutorial/machine_learning_map/
Format Thine Data
Format The Data
• Pandas FTW!
• Use the map() function to convert any text to a
number
• Fill in any missing values
• Split the data into features (the data) and targets
(the outcome to predict) using .values on the
DataFrame
map()
def update_failure_explanations(type):
if type == 'dob':
return 0
elif type == 'name':
return 1
elif type == 'ssn dob name':
return 2
elif type == 'ssn':
return 3
elif type == 'ssn name':
return 4
elif type == 'ssn dob':
return 5
elif type == 'dob name':
return 6
Fill In Missing Values
df.my_field.fillna(‘Missing', inplace=True)
df.fillna(0, inplace=True)
Split the Data
t_data = raw_data.iloc[:,0:22].values
1. Create a matrix of values
t_targets = raw_data['verified'].values
2. Create a matrix of targets
Get the (Right) Data
Get The Right Data
• This is called “Feature selection”
• Univariate feature selection
• SelectKBest removes all but the k highest scoring features
• SelectPercentile removes all but a user-specified highest scoring
percentage of features using common univariate statistical tests for
each feature: false positive rate
• SelectFpr, false discovery rate SelectFdr, or family wise error SelectFwe.
• GenericUnivariateSelect allows to perform univariate feature selection
with a configurable strategy.
http://scikit-learn.org/stable/modules/feature_selection.html
Feed Your Model
Data => Model
1. Build the model
http://scikit-learn.org/stable/modules/cross_validation.html
from sklearn import linear_model
logClassifier = linear_model.LogisticRegression(C=1,
random_state=111)
2. Train the model
from sklearn import cross_validation
X_train, X_test, y_train, y_test = cross_validation.train_test_split(the_data,
the_targets,
cv=12,
test_size=0.20,
random_state=111)
logClassifier.fit(X_train, y_train)
Validate That!
Validation
1. Accuracy Score
http://scikit-learn.org/stable/modules/cross_validation.html
from sklearn import metrics
metrics.accuracy_score(y_test, predicted)
2. Confusion Matrix
metrics.confusion_matrix(y_test, predicted)
Save Your Model
Save the Model
Pickle it!
https://docs.python.org/3/library/pickle.html
import pickle
model_file = "/lr_classifier_09.29.15.dat"
pickle.dump(logClassifier, open(model_file, "wb"))
Did it work?
logClassifier2 = pickle.load(open(model, "rb"))
print(logClassifier2)
Ship It
Implement in Production
• Clean the data the same way you did for the model
• Feature mappings
• Column re-ordering
• Create a function that returns the prediction
• Deserialize the model from the file you created
• Feed the model the data in the same order
• Call .predict() and get your answer
Example
def verify_record(record_scores):
# Reload the trained model
tif = "models/t_lr_classifier_07.28.15.dat"
log_classifier = pickle.load(open(tcf, "rb"))
# Return the prediction
return log_classifier.predict(record_scores)[0]
Save The Predictions
Save Your Predictions
As you would any other piece of data
(Keep) Getting it Right
Unleash the minion army!
… or get more creative
Update It
Be Smart
Train it again, but with validated predictions
Review
Step Review
1. Select a predictive modeling technique to use
2. Get the data into a format for modeling
3. Ensure the “right” data is being used
4. Feed the data into the model
5. Validate the model results
Step Review
6. Save the model to use in production
7. Implement the model in production and apply it to
new observations
8. Save the new predictions
9. Ensure the model is correctly predicting outcomes
over time
10. Update the model with new training data
pythonbicookbook.com
Robert Dempsey
robertwdempsey
rdempsey
rdempsey
robertwdempsey.com
Image Credits
• Format: https://www.flickr.com/photos/zaqography/3835692243/
• Get right data: https://www.flickr.com/photos/encouragement/14759554777/
• Feed: https://www.flickr.com/photos/glutnix/4291194/
• Validate: https://www.flickr.com/photos/lord-jim/16827236591/
• Save: http://www.cnn.com/2015/09/13/living/candice-swanepoel-victorias-secret-model-falls-feat/
• Ship It: https://www.flickr.com/photos/oneeighteen/15492277272/
• Save Predictions: https://www.flickr.com/photos/eelssej_/486414113/
• Get it right: https://www.flickr.com/photos/clickflashphotos/3402287993/
• Update it: https://www.flickr.com/photos/dullhunk/5497202855/
• Review: https://www.flickr.com/photos/pluggedmind/10714537023/

Más contenido relacionado

La actualidad más candente

Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningPramit Choudhary
 
Feature enginnering and selection
Feature enginnering and selectionFeature enginnering and selection
Feature enginnering and selectionDavis David
 
Machine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixMachine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixAndrew Ferlitsch
 
Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...
Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...
Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...PAPIs.io
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning ProjectEng Teong Cheah
 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningShubhmay Potdar
 
Data Wrangling_1.pptx
Data Wrangling_1.pptxData Wrangling_1.pptx
Data Wrangling_1.pptxPallabiSahoo5
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kambererror007
 
What is Python? | Edureka
What is Python? | EdurekaWhat is Python? | Edureka
What is Python? | EdurekaEdureka!
 
Over fitting underfitting
Over fitting underfittingOver fitting underfitting
Over fitting underfittingSivapriyaS12
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingankur bhalla
 
Oracle - SQL-PL/SQL context switching
Oracle - SQL-PL/SQL context switchingOracle - SQL-PL/SQL context switching
Oracle - SQL-PL/SQL context switchingSmitha Padmanabhan
 

La actualidad más candente (20)

Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep Learning
 
Feature enginnering and selection
Feature enginnering and selectionFeature enginnering and selection
Feature enginnering and selection
 
Machine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixMachine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion Matrix
 
Angular js PPT
Angular js PPTAngular js PPT
Angular js PPT
 
Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...
Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...
Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...
 
Machine learning meetup
Machine learning meetupMachine learning meetup
Machine learning meetup
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning Project
 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter Tuning
 
Python: Polymorphism
Python: PolymorphismPython: Polymorphism
Python: Polymorphism
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data Wrangling_1.pptx
Data Wrangling_1.pptxData Wrangling_1.pptx
Data Wrangling_1.pptx
 
4. plsql
4. plsql4. plsql
4. plsql
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
Data Visualization With R
Data Visualization With RData Visualization With R
Data Visualization With R
 
2.mathematics for machine learning
2.mathematics for machine learning2.mathematics for machine learning
2.mathematics for machine learning
 
What is Python? | Edureka
What is Python? | EdurekaWhat is Python? | Edureka
What is Python? | Edureka
 
Python Functions
Python   FunctionsPython   Functions
Python Functions
 
Over fitting underfitting
Over fitting underfittingOver fitting underfitting
Over fitting underfitting
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Oracle - SQL-PL/SQL context switching
Oracle - SQL-PL/SQL context switchingOracle - SQL-PL/SQL context switching
Oracle - SQL-PL/SQL context switching
 

Destacado

Creating Your First Predictive Model In Python
Creating Your First Predictive Model In PythonCreating Your First Predictive Model In Python
Creating Your First Predictive Model In PythonRobert Dempsey
 
Cam cloud assisted privacy preserving mobile health monitoring
Cam cloud assisted privacy preserving mobile health monitoringCam cloud assisted privacy preserving mobile health monitoring
Cam cloud assisted privacy preserving mobile health monitoringIEEEFINALYEARPROJECTS
 
Cloud assisted mobile-access of health data with privacy and auditability
Cloud assisted mobile-access of health data with privacy and auditabilityCloud assisted mobile-access of health data with privacy and auditability
Cloud assisted mobile-access of health data with privacy and auditabilityIGEEKS TECHNOLOGIES
 
H2O Machine Learning and Kalman Filters for Machine Prognostics - Galvanize SF
H2O Machine Learning and Kalman Filters for Machine Prognostics - Galvanize SFH2O Machine Learning and Kalman Filters for Machine Prognostics - Galvanize SF
H2O Machine Learning and Kalman Filters for Machine Prognostics - Galvanize SFSri Ambati
 
A Predictive Model Factory Picks Up Steam
A Predictive Model Factory Picks Up SteamA Predictive Model Factory Picks Up Steam
A Predictive Model Factory Picks Up SteamSri Ambati
 
Enabling Exploratory Analysis of Large Data with Apache Spark and R
Enabling Exploratory Analysis of Large Data with Apache Spark and REnabling Exploratory Analysis of Large Data with Apache Spark and R
Enabling Exploratory Analysis of Large Data with Apache Spark and RDatabricks
 
Presentation of the unbalanced R package
Presentation of the unbalanced R packagePresentation of the unbalanced R package
Presentation of the unbalanced R packageAndrea Dal Pozzolo
 
Sach sentence completion
Sach sentence completionSach sentence completion
Sach sentence completionEyeFrani
 
Getting Started with Deep Learning using Scala
Getting Started with Deep Learning using ScalaGetting Started with Deep Learning using Scala
Getting Started with Deep Learning using ScalaTaisuke Oe
 
Predicting Customer Long Term Value at Eni Belgium
Predicting Customer Long Term Value at Eni BelgiumPredicting Customer Long Term Value at Eni Belgium
Predicting Customer Long Term Value at Eni BelgiumPython Predictions
 
501 sentence completion questions
501 sentence completion questions501 sentence completion questions
501 sentence completion questionsNguyen Phan
 
Objective Type Tests: Completion and Short - Answer Items
Objective Type Tests: Completion and Short - Answer ItemsObjective Type Tests: Completion and Short - Answer Items
Objective Type Tests: Completion and Short - Answer ItemsMr. Ronald Quileste, PhD
 
Harnessing and securing cloud in patient health monitoring
Harnessing and securing cloud in patient health monitoringHarnessing and securing cloud in patient health monitoring
Harnessing and securing cloud in patient health monitoringAshok Rangaswamy
 

Destacado (16)

Creating Your First Predictive Model In Python
Creating Your First Predictive Model In PythonCreating Your First Predictive Model In Python
Creating Your First Predictive Model In Python
 
Cam cloud assisted privacy preserving mobile health monitoring
Cam cloud assisted privacy preserving mobile health monitoringCam cloud assisted privacy preserving mobile health monitoring
Cam cloud assisted privacy preserving mobile health monitoring
 
Cloud assisted mobile-access of health data with privacy and auditability
Cloud assisted mobile-access of health data with privacy and auditabilityCloud assisted mobile-access of health data with privacy and auditability
Cloud assisted mobile-access of health data with privacy and auditability
 
H2O Machine Learning and Kalman Filters for Machine Prognostics - Galvanize SF
H2O Machine Learning and Kalman Filters for Machine Prognostics - Galvanize SFH2O Machine Learning and Kalman Filters for Machine Prognostics - Galvanize SF
H2O Machine Learning and Kalman Filters for Machine Prognostics - Galvanize SF
 
A Predictive Model Factory Picks Up Steam
A Predictive Model Factory Picks Up SteamA Predictive Model Factory Picks Up Steam
A Predictive Model Factory Picks Up Steam
 
Enabling Exploratory Analysis of Large Data with Apache Spark and R
Enabling Exploratory Analysis of Large Data with Apache Spark and REnabling Exploratory Analysis of Large Data with Apache Spark and R
Enabling Exploratory Analysis of Large Data with Apache Spark and R
 
Presentation of the unbalanced R package
Presentation of the unbalanced R packagePresentation of the unbalanced R package
Presentation of the unbalanced R package
 
Sach sentence completion
Sach sentence completionSach sentence completion
Sach sentence completion
 
Getting Started with Deep Learning using Scala
Getting Started with Deep Learning using ScalaGetting Started with Deep Learning using Scala
Getting Started with Deep Learning using Scala
 
Predicting Customer Long Term Value at Eni Belgium
Predicting Customer Long Term Value at Eni BelgiumPredicting Customer Long Term Value at Eni Belgium
Predicting Customer Long Term Value at Eni Belgium
 
501 sentence completion questions
501 sentence completion questions501 sentence completion questions
501 sentence completion questions
 
Sentence completion test
Sentence completion testSentence completion test
Sentence completion test
 
Objective Type Tests: Completion and Short - Answer Items
Objective Type Tests: Completion and Short - Answer ItemsObjective Type Tests: Completion and Short - Answer Items
Objective Type Tests: Completion and Short - Answer Items
 
Sack s sentence completion test report
Sack s sentence completion test reportSack s sentence completion test report
Sack s sentence completion test report
 
Harnessing and securing cloud in patient health monitoring
Harnessing and securing cloud in patient health monitoringHarnessing and securing cloud in patient health monitoring
Harnessing and securing cloud in patient health monitoring
 
Design and Drawing of CAM profiles
Design and Drawing of CAM profilesDesign and Drawing of CAM profiles
Design and Drawing of CAM profiles
 

Similar a Practical Predictive Modeling in Python

Start machine learning in 5 simple steps
Start machine learning in 5 simple stepsStart machine learning in 5 simple steps
Start machine learning in 5 simple stepsRenjith M P
 
Lecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learningLecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learningmy6305874
 
Spark MLlib - Training Material
Spark MLlib - Training Material Spark MLlib - Training Material
Spark MLlib - Training Material Bryan Yang
 
Intro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data ScientistsIntro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data ScientistsParinaz Ameri
 
Advanced WhizzML Workflows
Advanced WhizzML WorkflowsAdvanced WhizzML Workflows
Advanced WhizzML WorkflowsBigML, Inc
 
Running Intelligent Applications inside a Database: Deep Learning with Python...
Running Intelligent Applications inside a Database: Deep Learning with Python...Running Intelligent Applications inside a Database: Deep Learning with Python...
Running Intelligent Applications inside a Database: Deep Learning with Python...Miguel González-Fierro
 
Workshop: Your first machine learning project
Workshop: Your first machine learning projectWorkshop: Your first machine learning project
Workshop: Your first machine learning projectAlex Austin
 
Data Migrations in the App Engine Datastore
Data Migrations in the App Engine DatastoreData Migrations in the App Engine Datastore
Data Migrations in the App Engine DatastoreRyan Morlok
 
(Py)testing the Limits of Machine Learning
(Py)testing the Limits of Machine Learning(Py)testing the Limits of Machine Learning
(Py)testing the Limits of Machine LearningRebecca Bilbro
 
ML-Ops how to bring your data science to production
ML-Ops  how to bring your data science to productionML-Ops  how to bring your data science to production
ML-Ops how to bring your data science to productionHerman Wu
 
Unsupervised Aspect Based Sentiment Analysis at Scale
Unsupervised Aspect Based Sentiment Analysis at ScaleUnsupervised Aspect Based Sentiment Analysis at Scale
Unsupervised Aspect Based Sentiment Analysis at ScaleAaron (Ari) Bornstein
 
VSSML16 L8. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...
VSSML16 L8. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...VSSML16 L8. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...
VSSML16 L8. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...BigML, Inc
 
Hadoop France meetup Feb2016 : recommendations with spark
Hadoop France meetup  Feb2016 : recommendations with sparkHadoop France meetup  Feb2016 : recommendations with spark
Hadoop France meetup Feb2016 : recommendations with sparkModern Data Stack France
 
Machine learning and_nlp
Machine learning and_nlpMachine learning and_nlp
Machine learning and_nlpankit_ppt
 

Similar a Practical Predictive Modeling in Python (20)

Start machine learning in 5 simple steps
Start machine learning in 5 simple stepsStart machine learning in 5 simple steps
Start machine learning in 5 simple steps
 
OpenML 2019
OpenML 2019OpenML 2019
OpenML 2019
 
Data herding
Data herdingData herding
Data herding
 
Data herding
Data herdingData herding
Data herding
 
wk5ppt2_Iris
wk5ppt2_Iriswk5ppt2_Iris
wk5ppt2_Iris
 
Lecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learningLecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learning
 
Spark MLlib - Training Material
Spark MLlib - Training Material Spark MLlib - Training Material
Spark MLlib - Training Material
 
More on Pandas.pptx
More on Pandas.pptxMore on Pandas.pptx
More on Pandas.pptx
 
Intro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data ScientistsIntro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data Scientists
 
Advanced WhizzML Workflows
Advanced WhizzML WorkflowsAdvanced WhizzML Workflows
Advanced WhizzML Workflows
 
Running Intelligent Applications inside a Database: Deep Learning with Python...
Running Intelligent Applications inside a Database: Deep Learning with Python...Running Intelligent Applications inside a Database: Deep Learning with Python...
Running Intelligent Applications inside a Database: Deep Learning with Python...
 
Workshop: Your first machine learning project
Workshop: Your first machine learning projectWorkshop: Your first machine learning project
Workshop: Your first machine learning project
 
Data Migrations in the App Engine Datastore
Data Migrations in the App Engine DatastoreData Migrations in the App Engine Datastore
Data Migrations in the App Engine Datastore
 
(Py)testing the Limits of Machine Learning
(Py)testing the Limits of Machine Learning(Py)testing the Limits of Machine Learning
(Py)testing the Limits of Machine Learning
 
ML-Ops how to bring your data science to production
ML-Ops  how to bring your data science to productionML-Ops  how to bring your data science to production
ML-Ops how to bring your data science to production
 
Unsupervised Aspect Based Sentiment Analysis at Scale
Unsupervised Aspect Based Sentiment Analysis at ScaleUnsupervised Aspect Based Sentiment Analysis at Scale
Unsupervised Aspect Based Sentiment Analysis at Scale
 
VSSML16 L8. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...
VSSML16 L8. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...VSSML16 L8. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...
VSSML16 L8. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...
 
Python for data analysis
Python for data analysisPython for data analysis
Python for data analysis
 
Hadoop France meetup Feb2016 : recommendations with spark
Hadoop France meetup  Feb2016 : recommendations with sparkHadoop France meetup  Feb2016 : recommendations with spark
Hadoop France meetup Feb2016 : recommendations with spark
 
Machine learning and_nlp
Machine learning and_nlpMachine learning and_nlp
Machine learning and_nlp
 

Más de Robert Dempsey

Building A Production-Level Machine Learning Pipeline
Building A Production-Level Machine Learning PipelineBuilding A Production-Level Machine Learning Pipeline
Building A Production-Level Machine Learning PipelineRobert Dempsey
 
Using PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of DataUsing PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of DataRobert Dempsey
 
Analyzing Semi-Structured Data At Volume In The Cloud
Analyzing Semi-Structured Data At Volume In The CloudAnalyzing Semi-Structured Data At Volume In The Cloud
Analyzing Semi-Structured Data At Volume In The CloudRobert Dempsey
 
Web Scraping With Python
Web Scraping With PythonWeb Scraping With Python
Web Scraping With PythonRobert Dempsey
 
DC Python Intro Slides - Rob's Version
DC Python Intro Slides - Rob's VersionDC Python Intro Slides - Rob's Version
DC Python Intro Slides - Rob's VersionRobert Dempsey
 
Content Marketing Strategy for 2013
Content Marketing Strategy for 2013Content Marketing Strategy for 2013
Content Marketing Strategy for 2013Robert Dempsey
 
Creating Lead-Generating Social Media Campaigns
Creating Lead-Generating Social Media CampaignsCreating Lead-Generating Social Media Campaigns
Creating Lead-Generating Social Media CampaignsRobert Dempsey
 
Google AdWords Introduction
Google AdWords IntroductionGoogle AdWords Introduction
Google AdWords IntroductionRobert Dempsey
 
20 Tips For Freelance Success
20 Tips For Freelance Success20 Tips For Freelance Success
20 Tips For Freelance SuccessRobert Dempsey
 
How To Turn Your Business Into A Media Powerhouse
How To Turn Your Business Into A Media PowerhouseHow To Turn Your Business Into A Media Powerhouse
How To Turn Your Business Into A Media PowerhouseRobert Dempsey
 
Agile Teams as Innovation Teams
Agile Teams as Innovation TeamsAgile Teams as Innovation Teams
Agile Teams as Innovation TeamsRobert Dempsey
 
Introduction to kanban
Introduction to kanbanIntroduction to kanban
Introduction to kanbanRobert Dempsey
 
Get The **** Up And Market
Get The **** Up And MarketGet The **** Up And Market
Get The **** Up And MarketRobert Dempsey
 
Introduction To Inbound Marketing
Introduction To Inbound MarketingIntroduction To Inbound Marketing
Introduction To Inbound MarketingRobert Dempsey
 
Writing Agile Requirements
Writing  Agile  RequirementsWriting  Agile  Requirements
Writing Agile RequirementsRobert Dempsey
 
Introduction To Scrum For Managers
Introduction To Scrum For ManagersIntroduction To Scrum For Managers
Introduction To Scrum For ManagersRobert Dempsey
 
Introduction to Agile for Managers
Introduction to Agile for ManagersIntroduction to Agile for Managers
Introduction to Agile for ManagersRobert Dempsey
 

Más de Robert Dempsey (20)

Building A Production-Level Machine Learning Pipeline
Building A Production-Level Machine Learning PipelineBuilding A Production-Level Machine Learning Pipeline
Building A Production-Level Machine Learning Pipeline
 
Using PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of DataUsing PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of Data
 
Analyzing Semi-Structured Data At Volume In The Cloud
Analyzing Semi-Structured Data At Volume In The CloudAnalyzing Semi-Structured Data At Volume In The Cloud
Analyzing Semi-Structured Data At Volume In The Cloud
 
Growth Hacking 101
Growth Hacking 101Growth Hacking 101
Growth Hacking 101
 
Web Scraping With Python
Web Scraping With PythonWeb Scraping With Python
Web Scraping With Python
 
DC Python Intro Slides - Rob's Version
DC Python Intro Slides - Rob's VersionDC Python Intro Slides - Rob's Version
DC Python Intro Slides - Rob's Version
 
Content Marketing Strategy for 2013
Content Marketing Strategy for 2013Content Marketing Strategy for 2013
Content Marketing Strategy for 2013
 
Creating Lead-Generating Social Media Campaigns
Creating Lead-Generating Social Media CampaignsCreating Lead-Generating Social Media Campaigns
Creating Lead-Generating Social Media Campaigns
 
Goal Writing Workshop
Goal Writing WorkshopGoal Writing Workshop
Goal Writing Workshop
 
Google AdWords Introduction
Google AdWords IntroductionGoogle AdWords Introduction
Google AdWords Introduction
 
20 Tips For Freelance Success
20 Tips For Freelance Success20 Tips For Freelance Success
20 Tips For Freelance Success
 
How To Turn Your Business Into A Media Powerhouse
How To Turn Your Business Into A Media PowerhouseHow To Turn Your Business Into A Media Powerhouse
How To Turn Your Business Into A Media Powerhouse
 
Agile Teams as Innovation Teams
Agile Teams as Innovation TeamsAgile Teams as Innovation Teams
Agile Teams as Innovation Teams
 
Introduction to kanban
Introduction to kanbanIntroduction to kanban
Introduction to kanban
 
Get The **** Up And Market
Get The **** Up And MarketGet The **** Up And Market
Get The **** Up And Market
 
Introduction To Inbound Marketing
Introduction To Inbound MarketingIntroduction To Inbound Marketing
Introduction To Inbound Marketing
 
Writing Agile Requirements
Writing  Agile  RequirementsWriting  Agile  Requirements
Writing Agile Requirements
 
Twitter For Business
Twitter For BusinessTwitter For Business
Twitter For Business
 
Introduction To Scrum For Managers
Introduction To Scrum For ManagersIntroduction To Scrum For Managers
Introduction To Scrum For Managers
 
Introduction to Agile for Managers
Introduction to Agile for ManagersIntroduction to Agile for Managers
Introduction to Agile for Managers
 

Último

Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 

Último (20)

Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 

Practical Predictive Modeling in Python

  • 1. Practical Predictive Modeling in Python Robert Dempsey robertwdempsey.com
  • 4. Doing All Things In SQL Makes Panda sad and confused
  • 5. Each New Thing You Learn Leads to another new thing to learn, and another, and…
  • 6. So Many Things 1. Which predictive modeling technique to use 2. How to get the data into a format for modeling 3. How to ensure the “right” data is being used 4. How to feed the data into the model 5. How to validate the model results 6. How to save the model to use in production 7. How to implement the model in production and apply it to new observations 8. How to save the new predictions 9. How to ensure, over time, that the model is correctly predicting outcomes 10.How to later update the model with new training data
  • 7.
  • 9. Model Selection • How much data do you have? • Are you predicting a category? A quantity? • Do you have labeled data? • Do you know the number of categories? • How much data do you have?
  • 10. Regression • Used for estimating the relationships among variables • Use when: • Predicting a quantity • More than 50 samples
  • 11. Classification • Used to answer “what is this object” • Use when: • Predicting a category • Have labeled data
  • 12. Clustering • Used to group similar objects • Use when: • Predicting a category • Don’t have labeled data • Number of categories is known or unknown • Have more than 50 samples
  • 13. Dimensionality Reduction • Process for reducing the number of random variables under consideration (feature selection and feature extraction) • Use when: • Not predicting a category or a quantity • Just looking around
  • 16. Format The Data • Pandas FTW! • Use the map() function to convert any text to a number • Fill in any missing values • Split the data into features (the data) and targets (the outcome to predict) using .values on the DataFrame
  • 17. map() def update_failure_explanations(type): if type == 'dob': return 0 elif type == 'name': return 1 elif type == 'ssn dob name': return 2 elif type == 'ssn': return 3 elif type == 'ssn name': return 4 elif type == 'ssn dob': return 5 elif type == 'dob name': return 6
  • 18. Fill In Missing Values df.my_field.fillna(‘Missing', inplace=True) df.fillna(0, inplace=True)
  • 19. Split the Data t_data = raw_data.iloc[:,0:22].values 1. Create a matrix of values t_targets = raw_data['verified'].values 2. Create a matrix of targets
  • 21. Get The Right Data • This is called “Feature selection” • Univariate feature selection • SelectKBest removes all but the k highest scoring features • SelectPercentile removes all but a user-specified highest scoring percentage of features using common univariate statistical tests for each feature: false positive rate • SelectFpr, false discovery rate SelectFdr, or family wise error SelectFwe. • GenericUnivariateSelect allows to perform univariate feature selection with a configurable strategy. http://scikit-learn.org/stable/modules/feature_selection.html
  • 23. Data => Model 1. Build the model http://scikit-learn.org/stable/modules/cross_validation.html from sklearn import linear_model logClassifier = linear_model.LogisticRegression(C=1, random_state=111) 2. Train the model from sklearn import cross_validation X_train, X_test, y_train, y_test = cross_validation.train_test_split(the_data, the_targets, cv=12, test_size=0.20, random_state=111) logClassifier.fit(X_train, y_train)
  • 25. Validation 1. Accuracy Score http://scikit-learn.org/stable/modules/cross_validation.html from sklearn import metrics metrics.accuracy_score(y_test, predicted) 2. Confusion Matrix metrics.confusion_matrix(y_test, predicted)
  • 27. Save the Model Pickle it! https://docs.python.org/3/library/pickle.html import pickle model_file = "/lr_classifier_09.29.15.dat" pickle.dump(logClassifier, open(model_file, "wb")) Did it work? logClassifier2 = pickle.load(open(model, "rb")) print(logClassifier2)
  • 29. Implement in Production • Clean the data the same way you did for the model • Feature mappings • Column re-ordering • Create a function that returns the prediction • Deserialize the model from the file you created • Feed the model the data in the same order • Call .predict() and get your answer
  • 30. Example def verify_record(record_scores): # Reload the trained model tif = "models/t_lr_classifier_07.28.15.dat" log_classifier = pickle.load(open(tcf, "rb")) # Return the prediction return log_classifier.predict(record_scores)[0]
  • 32. Save Your Predictions As you would any other piece of data
  • 34. Unleash the minion army! … or get more creative
  • 36. Be Smart Train it again, but with validated predictions
  • 38. Step Review 1. Select a predictive modeling technique to use 2. Get the data into a format for modeling 3. Ensure the “right” data is being used 4. Feed the data into the model 5. Validate the model results
  • 39. Step Review 6. Save the model to use in production 7. Implement the model in production and apply it to new observations 8. Save the new predictions 9. Ensure the model is correctly predicting outcomes over time 10. Update the model with new training data
  • 42. Image Credits • Format: https://www.flickr.com/photos/zaqography/3835692243/ • Get right data: https://www.flickr.com/photos/encouragement/14759554777/ • Feed: https://www.flickr.com/photos/glutnix/4291194/ • Validate: https://www.flickr.com/photos/lord-jim/16827236591/ • Save: http://www.cnn.com/2015/09/13/living/candice-swanepoel-victorias-secret-model-falls-feat/ • Ship It: https://www.flickr.com/photos/oneeighteen/15492277272/ • Save Predictions: https://www.flickr.com/photos/eelssej_/486414113/ • Get it right: https://www.flickr.com/photos/clickflashphotos/3402287993/ • Update it: https://www.flickr.com/photos/dullhunk/5497202855/ • Review: https://www.flickr.com/photos/pluggedmind/10714537023/