SlideShare una empresa de Scribd logo
1 de 41
Descargar para leer sin conexión
Machine Learning in Action
Andres Kull
Product Analyst @ Pipedrive
Machine Learning Estonia meetup, October 11, 2016
About me
• Pipedrive:
• product analyst, from Feb 2016
• Funderbeam:
• data scientist, 2013-2015
• Elvior
• CEO, 1992 - 2012
• software development, test automation
• model based testing (PhD, 2009)
Topics
• About Pipedrive
• Predictive analytics in Pipedrive
• Closer insight to one predictive model
• Thought process
• Tools and methods used
• Deployment
• Monitoring
About Pipedrive (2 min video)
Some facts about Pipedrive
• > 30k paying customers
• from 155 countries
• biggest markets are US, Brazil
• > 220 employees
• ~ 180 in Estonia (Tallinn, Tartu)
• ~ 40 in New York
• 20 different nationalities
image area
My tasks @Pipedrive
• Serve product teams with data insight
• Predictive analytics / towards sales AI
CRM companies are rushing to AI
$75 mil
CRM AI opportunities
Predictive leads scoring
Predict deals outcome
likelihood to close
estimated close date
Recommend user actions
type of next action
email content
next action date/time
Teach users how to improve
Predictive analytics solutions at Pipedrive
For
marketing,
sales and
support
For users • predicting open deals pipeline value
• provides means to adjust selling process to meet the sales goals
Deals success prediction
• identify customers who are about to churn
• provides health score of subscription
Churn prediction
• identify inactive companies in trial
Trial conversion prediction
My R toolbox
Storage Access
RPostgreSQL, aws.s3
Dataframe Operations
dplyr, tidy (Hadley Wickham)
Machine Learning
Boruta, randomForest, caret, ROCR, AUC
Visual Data Exploration
ggplot2
RStudio
IDE
R packages
R references
Everything you need is to follow
Hadley Wickham
http://hadley.nz/
@hadleywickham
#rstats
… and you are in good hands
Trials conversion model
Business goal
• increase trial users conversion rate
• identify such trial companies who need
some engagement triggers
image area
Initial questions from business development
• what converting trial users do differently than non-converting ones?
• identify mandatory actions that has to be done during trial period to
become converting?
• actions:
• add deal
• add activity/reminder to deals
• invite other users
• …
Actions of successful trial companies
Percentages of successful trial companies who have done particular actions by
7th, 14th, 30th day
Actions split by successful and unsuccessful trials
• which percentage of companies
have performed particular action
by day 7 split by converting and
non-converting companies
Training a decision tree model
resulting modelselected features
Decision tree model
resulting model
IF activities_add < 5.5 THEN
IF user_joined < 0.5
THEN success = FALSE
ELSE IF user_joined < 1.5
THEN success = FALSE
ELSE success = TRUE
IF user_joined < 0.5 THEN
IF activities_add < 13.5
THEN success = FALSE
ELSE IF activities_add >= 179
THEN success = FALSE
ELSE success = TRUE
ELSE success = TRUE
ROC curve of decision tree modelTruepositiverate
AUC = 0.7
False positive rate
Area Under the ROC
Curve (AUC)
Can we do any better?
• Sure!
• Better feature selection
• Better ML algorithm (random forest)
• Better model evaluation with cross validation training
Let’s revisit model prediction goals
• act before most users are done
• predict trials success using first
5 day actions
Model development workflow
Features selection
Model training
Model evaluation Model deployment
good
enough?
Yes
No
• Iteratively:
• remove less important features
• add some new features
• evaluate if added or removed features
increased or decreased model accuracy
• continue until satisfied
Feature selection
• Select all relevant features
• Let the ML algorithm do the work
• filter out irrelevant features
• order features by importance
All relevant features I
can imagine
Selected
features
Filter out irrelevant features
• R Boruta package was used
• bor <- Boruta(y ~ ., data = train)
• bor <- TentativeRoughFix(bor) # for fixing Tentative features
• bor$finalDecision # <- contains Confirmed / Rejected for all features
• Only confirmed features will be passed to model training phase
List of features
• activities_edit
• deals_edit
• organizations_add
• persons_add
• added_activity
• changed_deal_stage
• clicked_import_from_other_crm
• clicked_taf_facebook_invite_button
• clicked_taf_invites_send_button
• clicked_taf_twitter_invite_button
• completed_activity
• edited_stage
• enabled_google_calendar_sync
• enabled_google_contacts_sync
• feature_deal_probability
• feature_products
• invite_friend
• invited_existing_user
• invited_new_user
• logged_in
• lost_a_deal
• user_joined
• won_a_deal
Order features by importance
• R RandomForest trained model object includes feature importances
• First you train the model
• rf_model <- randomForest(y ~ ., data = train, )
• … and then access the features relative importances
• imp_var <- varImp(rf_model)$importance
Features ordered by relative importance
1 persons_add
2 organizations_add
3 added_deal
4 logged_in
5 deals_edit
6 added_activity
7 changed_deal_stage
8 activities_edit
9 user_joined
10 invited_new_user
11 completed_activity
12 won_a_deal
13 lost_a_deal
14 feature_products
15 feature_deal_probability
16 invited_existing_user
17 edited_stage
100.000000
88.070291
85.828879
84.296198
74.448121
69.044263
61.072545
51.355769
28.947384
28.329157
21.877124
17.906090
12.477377
9.518529
8.309032
3.781910
0.000000
Split data to training and test data
• inTrain <- createDataPartition(y = all_ev[y][,1], p = 0.7, list = FALSE)
• train <- all_ev[inTrain, ]
• test <- all_ev[-inTrain, ]
• training set: 70% of companies
• hold-out test set: 30% of companies
• R caret package createDataPartition() function was used to split data
Model training using 5-fold cross validation
rf_model <- train(y ~ ., data = train,
method = "rf",
trControl = trainControl(
method = "cv",
number = 5
),
metric = "ROC",
tuneGrid = expand.grid(mtry = c(2, 4, 6, 8))
)
• R caret package train() and trainControl() functions do the job
Model evaluation
mtry <- rf_model$bestTune$mtry
train_auc <- rf_model$results$ROC[as.numeric(rownames(rf_model$bestTune))]
• model AUC on training data
• model AUC on test data
score <- predict(rf_model, newdata = test, type = "prob")
pred <- prediction(score[, 2], test[y])
test_auc <- performance(pred, "auc")
• AUC on training data 0.82..0.88
• AUC on test data 0.83..0.88
• Benchmark (decision tree) AUC = 0.7
Daily training and prediction
• Model is trained daily
Model training date-1 month-2 months
Companies that started
30 day trials here
Moving training data window
• Predictions are calculated daily for all companies in trial
Model and predictions traceability
• Use cases
• Monitoring processes
• Explaining prediction results
• Relevant data has to be saved for traceability
Model training traceability
• All model instances are saved in S3
• The following training data is saved in DB
• training timestamp
• model location
• mtry
• train_auc
• test_auc
• n_test
• n_train
• feature importances
• model training duration
Predictions traceability
• The following data is saved
• prediction timestamp
• model id
• company id
• predicted trial success likelihood
• feature values used in prediction
Monitoring
Re:dash is used for dashboards
redash.io
Number of companies in trial
training AUC vs test AUC
feature relative importances
modelling duration
mtry
• andres.kull@pipedrive.com
• @andres_kull
Q & A

Más contenido relacionado

La actualidad más candente

Adding Open Data Value to 'Closed Data' Problems
Adding Open Data Value to 'Closed Data' ProblemsAdding Open Data Value to 'Closed Data' Problems
Adding Open Data Value to 'Closed Data' ProblemsSimon Price
 
Data Science applications in business
Data Science applications in businessData Science applications in business
Data Science applications in businessVladyslav Yakovenko
 
Data Science Lifecycle
Data Science LifecycleData Science Lifecycle
Data Science LifecycleSwapnilDahake2
 
Predictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use CasesPredictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use CasesKimberley Mitchell
 
Predictive Analytics: Advanced techniques in data mining
Predictive Analytics: Advanced techniques in data miningPredictive Analytics: Advanced techniques in data mining
Predictive Analytics: Advanced techniques in data miningSAS Asia Pacific
 
Programming for data science in python
Programming for data science in pythonProgramming for data science in python
Programming for data science in pythonUmmeSalmaM1
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningLars Marius Garshol
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...Edureka!
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectbodaceacat
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceNiko Vuokko
 
How to get started in Kaggle competition
How to get started in Kaggle competitionHow to get started in Kaggle competition
How to get started in Kaggle competitionMerja Kajava
 
Data science | What is Data science
Data science | What is Data scienceData science | What is Data science
Data science | What is Data scienceShilpaKrishna6
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandrySri Ambati
 
H2O World - Machine Learning for non-data scientists
H2O World - Machine Learning for non-data scientistsH2O World - Machine Learning for non-data scientists
H2O World - Machine Learning for non-data scientistsSri Ambati
 
Machine Learning Project Lifecycle
Machine Learning Project LifecycleMachine Learning Project Lifecycle
Machine Learning Project LifecycleAbdelhak MAHMOUDI
 
Real-time Recommendations for Retail: Architecture, Algorithms, and Design
Real-time Recommendations for Retail: Architecture, Algorithms, and DesignReal-time Recommendations for Retail: Architecture, Algorithms, and Design
Real-time Recommendations for Retail: Architecture, Algorithms, and DesignJuliet Hougland
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsSri Ambati
 

La actualidad más candente (20)

Adding Open Data Value to 'Closed Data' Problems
Adding Open Data Value to 'Closed Data' ProblemsAdding Open Data Value to 'Closed Data' Problems
Adding Open Data Value to 'Closed Data' Problems
 
Data Science in Action
Data Science in ActionData Science in Action
Data Science in Action
 
Data Science applications in business
Data Science applications in businessData Science applications in business
Data Science applications in business
 
Data Science Lifecycle
Data Science LifecycleData Science Lifecycle
Data Science Lifecycle
 
Predictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use CasesPredictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use Cases
 
Predictive Analytics: Advanced techniques in data mining
Predictive Analytics: Advanced techniques in data miningPredictive Analytics: Advanced techniques in data mining
Predictive Analytics: Advanced techniques in data mining
 
Programming for data science in python
Programming for data science in pythonProgramming for data science in python
Programming for data science in python
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
ML master class
ML master classML master class
ML master class
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data Science
Data ScienceData Science
Data Science
 
How to get started in Kaggle competition
How to get started in Kaggle competitionHow to get started in Kaggle competition
How to get started in Kaggle competition
 
Data science | What is Data science
Data science | What is Data scienceData science | What is Data science
Data science | What is Data science
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark Landry
 
H2O World - Machine Learning for non-data scientists
H2O World - Machine Learning for non-data scientistsH2O World - Machine Learning for non-data scientists
H2O World - Machine Learning for non-data scientists
 
Machine Learning Project Lifecycle
Machine Learning Project LifecycleMachine Learning Project Lifecycle
Machine Learning Project Lifecycle
 
Real-time Recommendations for Retail: Architecture, Algorithms, and Design
Real-time Recommendations for Retail: Architecture, Algorithms, and DesignReal-time Recommendations for Retail: Architecture, Algorithms, and Design
Real-time Recommendations for Retail: Architecture, Algorithms, and Design
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
 

Destacado

Marketo Summit 2016
Marketo Summit 2016 Marketo Summit 2016
Marketo Summit 2016 Captora
 
What is captora?
What is captora?What is captora?
What is captora?Captora
 
Lingvist - Statistical Methods in Language Learning
Lingvist - Statistical Methods in Language LearningLingvist - Statistical Methods in Language Learning
Lingvist - Statistical Methods in Language LearningAndré Karpištšenko
 
alphablues - ML applied to text and image in chat bots
alphablues - ML applied to text and image in chat botsalphablues - ML applied to text and image in chat bots
alphablues - ML applied to text and image in chat botsAndré Karpištšenko
 
Pipedrive API Integration
Pipedrive API IntegrationPipedrive API Integration
Pipedrive API IntegrationData2CRM.API
 
How application performance requirements impacted the (r)evolution of the Doc...
How application performance requirements impacted the (r)evolution of the Doc...How application performance requirements impacted the (r)evolution of the Doc...
How application performance requirements impacted the (r)evolution of the Doc...Renno Reinurm
 
Social sales influencemktg_samfiorella_april2014
Social sales influencemktg_samfiorella_april2014Social sales influencemktg_samfiorella_april2014
Social sales influencemktg_samfiorella_april2014Sam Fiorella
 
Vibe for PipeDrive
Vibe for PipeDriveVibe for PipeDrive
Vibe for PipeDriveJofin Joseph
 
Pipedrive - NOAH15 London
Pipedrive - NOAH15 LondonPipedrive - NOAH15 London
Pipedrive - NOAH15 LondonNOAH Advisors
 
CRM Support Desk Presentation
CRM Support Desk Presentation			CRM Support Desk Presentation
CRM Support Desk Presentation Prasanna Yogesh
 
Pipedrive - NOAH16 Berlin
Pipedrive - NOAH16 BerlinPipedrive - NOAH16 Berlin
Pipedrive - NOAH16 BerlinNOAH Advisors
 
How Pipedrive helped capytech
How Pipedrive helped capytechHow Pipedrive helped capytech
How Pipedrive helped capytechGetApp
 
Pipedrive DW on AWS
Pipedrive DW on AWSPipedrive DW on AWS
Pipedrive DW on AWSPipedrive
 
Fundraising Workshop
Fundraising WorkshopFundraising Workshop
Fundraising WorkshopShai Goldman
 
AppsFlyer Mobile App Tracking | Campaign & Engagement Analytics
AppsFlyer Mobile App Tracking | Campaign & Engagement AnalyticsAppsFlyer Mobile App Tracking | Campaign & Engagement Analytics
AppsFlyer Mobile App Tracking | Campaign & Engagement AnalyticsAppsFlyer
 
11 sales tools to improve your business
11 sales tools to improve your business11 sales tools to improve your business
11 sales tools to improve your businessAmure Pinho
 

Destacado (20)

Marketo Summit 2016
Marketo Summit 2016 Marketo Summit 2016
Marketo Summit 2016
 
What is captora?
What is captora?What is captora?
What is captora?
 
Lingvist - Statistical Methods in Language Learning
Lingvist - Statistical Methods in Language LearningLingvist - Statistical Methods in Language Learning
Lingvist - Statistical Methods in Language Learning
 
Knowledge Discovery in Production
Knowledge Discovery in ProductionKnowledge Discovery in Production
Knowledge Discovery in Production
 
alphablues - ML applied to text and image in chat bots
alphablues - ML applied to text and image in chat botsalphablues - ML applied to text and image in chat bots
alphablues - ML applied to text and image in chat bots
 
Pipedrive API Integration
Pipedrive API IntegrationPipedrive API Integration
Pipedrive API Integration
 
How application performance requirements impacted the (r)evolution of the Doc...
How application performance requirements impacted the (r)evolution of the Doc...How application performance requirements impacted the (r)evolution of the Doc...
How application performance requirements impacted the (r)evolution of the Doc...
 
pipedrivepresentation
pipedrivepresentationpipedrivepresentation
pipedrivepresentation
 
Data science for everyone
Data science for everyoneData science for everyone
Data science for everyone
 
Social sales influencemktg_samfiorella_april2014
Social sales influencemktg_samfiorella_april2014Social sales influencemktg_samfiorella_april2014
Social sales influencemktg_samfiorella_april2014
 
Vibe for PipeDrive
Vibe for PipeDriveVibe for PipeDrive
Vibe for PipeDrive
 
Pipedrive - NOAH15 London
Pipedrive - NOAH15 LondonPipedrive - NOAH15 London
Pipedrive - NOAH15 London
 
CRM Support Desk Presentation
CRM Support Desk Presentation			CRM Support Desk Presentation
CRM Support Desk Presentation
 
Pipedrive - NOAH16 Berlin
Pipedrive - NOAH16 BerlinPipedrive - NOAH16 Berlin
Pipedrive - NOAH16 Berlin
 
How Pipedrive helped capytech
How Pipedrive helped capytechHow Pipedrive helped capytech
How Pipedrive helped capytech
 
Pipedrive DW on AWS
Pipedrive DW on AWSPipedrive DW on AWS
Pipedrive DW on AWS
 
Fundraising Workshop
Fundraising WorkshopFundraising Workshop
Fundraising Workshop
 
AppsFlyer Mobile App Tracking | Campaign & Engagement Analytics
AppsFlyer Mobile App Tracking | Campaign & Engagement AnalyticsAppsFlyer Mobile App Tracking | Campaign & Engagement Analytics
AppsFlyer Mobile App Tracking | Campaign & Engagement Analytics
 
11 sales tools to improve your business
11 sales tools to improve your business11 sales tools to improve your business
11 sales tools to improve your business
 
Sales Pipeline & Process Management
Sales Pipeline & Process ManagementSales Pipeline & Process Management
Sales Pipeline & Process Management
 

Similar a Machine learning in action at Pipedrive

Understand your Business processes
Understand your Business processesUnderstand your Business processes
Understand your Business processesGaurav Kumar
 
ERPredefined Corporate Profile
ERPredefined Corporate ProfileERPredefined Corporate Profile
ERPredefined Corporate ProfileERPredefined
 
Fundamentals of business process management and BPMN
Fundamentals of business process management and BPMNFundamentals of business process management and BPMN
Fundamentals of business process management and BPMNGregor Polančič
 
Quantifying DevOps Adoption Empirically for Demonstrable ROI
Quantifying DevOps Adoption Empirically for Demonstrable ROIQuantifying DevOps Adoption Empirically for Demonstrable ROI
Quantifying DevOps Adoption Empirically for Demonstrable ROIDevOps for Enterprise Systems
 
Narendra tomar (test manager)
Narendra tomar (test manager)Narendra tomar (test manager)
Narendra tomar (test manager)Narendra Tomar
 
Why Does My Conversion Rate Suck? Craig Sullivan, Senior Optimisation Consult...
Why Does My Conversion Rate Suck? Craig Sullivan, Senior Optimisation Consult...Why Does My Conversion Rate Suck? Craig Sullivan, Senior Optimisation Consult...
Why Does My Conversion Rate Suck? Craig Sullivan, Senior Optimisation Consult...PRWD
 
Software Testing Process & Trend
Software Testing Process & TrendSoftware Testing Process & Trend
Software Testing Process & TrendKMS Technology
 
Productionising Machine Learning Models
Productionising Machine Learning ModelsProductionising Machine Learning Models
Productionising Machine Learning ModelsTash Bickley
 
Top Business Benefits of Application Lifecycle Management (ALM)
Top Business Benefits of Application Lifecycle Management (ALM)Top Business Benefits of Application Lifecycle Management (ALM)
Top Business Benefits of Application Lifecycle Management (ALM)Imaginet
 
Manoj Marada 8+Yrs Scm Manufacturing 31 Mar 2010
Manoj Marada 8+Yrs Scm Manufacturing 31 Mar 2010Manoj Marada 8+Yrs Scm Manufacturing 31 Mar 2010
Manoj Marada 8+Yrs Scm Manufacturing 31 Mar 2010MANOJ MARADA
 
Practical Software Development Metrics
Practical Software Development MetricsPractical Software Development Metrics
Practical Software Development MetricsJari Kuusisto
 
Success Operations 2.0 - CSSummit18
Success Operations 2.0 - CSSummit18Success Operations 2.0 - CSSummit18
Success Operations 2.0 - CSSummit18Totango
 
OpenSymmetry Corporate Presentation
OpenSymmetry Corporate PresentationOpenSymmetry Corporate Presentation
OpenSymmetry Corporate PresentationOpenSymmetry
 

Similar a Machine learning in action at Pipedrive (20)

Understand your Business processes
Understand your Business processesUnderstand your Business processes
Understand your Business processes
 
ERPredefined Corporate Profile
ERPredefined Corporate ProfileERPredefined Corporate Profile
ERPredefined Corporate Profile
 
Fundamentals of business process management and BPMN
Fundamentals of business process management and BPMNFundamentals of business process management and BPMN
Fundamentals of business process management and BPMN
 
Call Center Life 101 v3
Call Center Life 101 v3Call Center Life 101 v3
Call Center Life 101 v3
 
Quantifying DevOps Adoption Empirically for Demonstrable ROI
Quantifying DevOps Adoption Empirically for Demonstrable ROIQuantifying DevOps Adoption Empirically for Demonstrable ROI
Quantifying DevOps Adoption Empirically for Demonstrable ROI
 
Narendra tomar (test manager)
Narendra tomar (test manager)Narendra tomar (test manager)
Narendra tomar (test manager)
 
Tech 031 Unit 5pp.ppt
Tech 031 Unit 5pp.pptTech 031 Unit 5pp.ppt
Tech 031 Unit 5pp.ppt
 
Why Does My Conversion Rate Suck? Craig Sullivan, Senior Optimisation Consult...
Why Does My Conversion Rate Suck? Craig Sullivan, Senior Optimisation Consult...Why Does My Conversion Rate Suck? Craig Sullivan, Senior Optimisation Consult...
Why Does My Conversion Rate Suck? Craig Sullivan, Senior Optimisation Consult...
 
Benchmarking
BenchmarkingBenchmarking
Benchmarking
 
Software Testing Process & Trend
Software Testing Process & TrendSoftware Testing Process & Trend
Software Testing Process & Trend
 
Productionising Machine Learning Models
Productionising Machine Learning ModelsProductionising Machine Learning Models
Productionising Machine Learning Models
 
Top Business Benefits of Application Lifecycle Management (ALM)
Top Business Benefits of Application Lifecycle Management (ALM)Top Business Benefits of Application Lifecycle Management (ALM)
Top Business Benefits of Application Lifecycle Management (ALM)
 
Manoj Marada 8+Yrs Scm Manufacturing 31 Mar 2010
Manoj Marada 8+Yrs Scm Manufacturing 31 Mar 2010Manoj Marada 8+Yrs Scm Manufacturing 31 Mar 2010
Manoj Marada 8+Yrs Scm Manufacturing 31 Mar 2010
 
Practical Software Development Metrics
Practical Software Development MetricsPractical Software Development Metrics
Practical Software Development Metrics
 
Profile Sumana_Sen
Profile Sumana_SenProfile Sumana_Sen
Profile Sumana_Sen
 
Qtp - Introduction values
Qtp - Introduction valuesQtp - Introduction values
Qtp - Introduction values
 
Success Operations 2.0 - CSSummit18
Success Operations 2.0 - CSSummit18Success Operations 2.0 - CSSummit18
Success Operations 2.0 - CSSummit18
 
Software metrics by Dr. B. J. Mohite
Software metrics by Dr. B. J. MohiteSoftware metrics by Dr. B. J. Mohite
Software metrics by Dr. B. J. Mohite
 
Om ibs-1
Om ibs-1Om ibs-1
Om ibs-1
 
OpenSymmetry Corporate Presentation
OpenSymmetry Corporate PresentationOpenSymmetry Corporate Presentation
OpenSymmetry Corporate Presentation
 

Más de André Karpištšenko

Más de André Karpištšenko (6)

Starship, Building Intelligent Delivery Robots
Starship, Building Intelligent Delivery RobotsStarship, Building Intelligent Delivery Robots
Starship, Building Intelligent Delivery Robots
 
Knowledge Discovery
Knowledge DiscoveryKnowledge Discovery
Knowledge Discovery
 
Cognitive plausibility in learning algorithms
Cognitive plausibility in learning algorithmsCognitive plausibility in learning algorithms
Cognitive plausibility in learning algorithms
 
Practical Deep Learning
Practical Deep LearningPractical Deep Learning
Practical Deep Learning
 
AI Control
AI ControlAI Control
AI Control
 
Deep learning
Deep learningDeep learning
Deep learning
 

Último

Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 

Último (20)

Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 

Machine learning in action at Pipedrive

  • 1. Machine Learning in Action Andres Kull Product Analyst @ Pipedrive Machine Learning Estonia meetup, October 11, 2016
  • 2. About me • Pipedrive: • product analyst, from Feb 2016 • Funderbeam: • data scientist, 2013-2015 • Elvior • CEO, 1992 - 2012 • software development, test automation • model based testing (PhD, 2009)
  • 3. Topics • About Pipedrive • Predictive analytics in Pipedrive • Closer insight to one predictive model • Thought process • Tools and methods used • Deployment • Monitoring
  • 4. About Pipedrive (2 min video)
  • 5. Some facts about Pipedrive • > 30k paying customers • from 155 countries • biggest markets are US, Brazil • > 220 employees • ~ 180 in Estonia (Tallinn, Tartu) • ~ 40 in New York • 20 different nationalities image area
  • 6. My tasks @Pipedrive • Serve product teams with data insight • Predictive analytics / towards sales AI
  • 7. CRM companies are rushing to AI $75 mil
  • 8. CRM AI opportunities Predictive leads scoring Predict deals outcome likelihood to close estimated close date Recommend user actions type of next action email content next action date/time Teach users how to improve
  • 9. Predictive analytics solutions at Pipedrive For marketing, sales and support For users • predicting open deals pipeline value • provides means to adjust selling process to meet the sales goals Deals success prediction • identify customers who are about to churn • provides health score of subscription Churn prediction • identify inactive companies in trial Trial conversion prediction
  • 10. My R toolbox Storage Access RPostgreSQL, aws.s3 Dataframe Operations dplyr, tidy (Hadley Wickham) Machine Learning Boruta, randomForest, caret, ROCR, AUC Visual Data Exploration ggplot2 RStudio IDE R packages
  • 11. R references Everything you need is to follow Hadley Wickham http://hadley.nz/ @hadleywickham #rstats … and you are in good hands
  • 13. Business goal • increase trial users conversion rate • identify such trial companies who need some engagement triggers image area
  • 14. Initial questions from business development • what converting trial users do differently than non-converting ones? • identify mandatory actions that has to be done during trial period to become converting? • actions: • add deal • add activity/reminder to deals • invite other users • …
  • 15. Actions of successful trial companies Percentages of successful trial companies who have done particular actions by 7th, 14th, 30th day
  • 16. Actions split by successful and unsuccessful trials • which percentage of companies have performed particular action by day 7 split by converting and non-converting companies
  • 17. Training a decision tree model resulting modelselected features
  • 18. Decision tree model resulting model IF activities_add < 5.5 THEN IF user_joined < 0.5 THEN success = FALSE ELSE IF user_joined < 1.5 THEN success = FALSE ELSE success = TRUE IF user_joined < 0.5 THEN IF activities_add < 13.5 THEN success = FALSE ELSE IF activities_add >= 179 THEN success = FALSE ELSE success = TRUE ELSE success = TRUE
  • 19. ROC curve of decision tree modelTruepositiverate AUC = 0.7 False positive rate Area Under the ROC Curve (AUC)
  • 20. Can we do any better? • Sure! • Better feature selection • Better ML algorithm (random forest) • Better model evaluation with cross validation training
  • 21. Let’s revisit model prediction goals • act before most users are done • predict trials success using first 5 day actions
  • 22. Model development workflow Features selection Model training Model evaluation Model deployment good enough? Yes No • Iteratively: • remove less important features • add some new features • evaluate if added or removed features increased or decreased model accuracy • continue until satisfied
  • 23. Feature selection • Select all relevant features • Let the ML algorithm do the work • filter out irrelevant features • order features by importance All relevant features I can imagine Selected features
  • 24. Filter out irrelevant features • R Boruta package was used • bor <- Boruta(y ~ ., data = train) • bor <- TentativeRoughFix(bor) # for fixing Tentative features • bor$finalDecision # <- contains Confirmed / Rejected for all features • Only confirmed features will be passed to model training phase
  • 25. List of features • activities_edit • deals_edit • organizations_add • persons_add • added_activity • changed_deal_stage • clicked_import_from_other_crm • clicked_taf_facebook_invite_button • clicked_taf_invites_send_button • clicked_taf_twitter_invite_button • completed_activity • edited_stage • enabled_google_calendar_sync • enabled_google_contacts_sync • feature_deal_probability • feature_products • invite_friend • invited_existing_user • invited_new_user • logged_in • lost_a_deal • user_joined • won_a_deal
  • 26. Order features by importance • R RandomForest trained model object includes feature importances • First you train the model • rf_model <- randomForest(y ~ ., data = train, ) • … and then access the features relative importances • imp_var <- varImp(rf_model)$importance
  • 27. Features ordered by relative importance 1 persons_add 2 organizations_add 3 added_deal 4 logged_in 5 deals_edit 6 added_activity 7 changed_deal_stage 8 activities_edit 9 user_joined 10 invited_new_user 11 completed_activity 12 won_a_deal 13 lost_a_deal 14 feature_products 15 feature_deal_probability 16 invited_existing_user 17 edited_stage 100.000000 88.070291 85.828879 84.296198 74.448121 69.044263 61.072545 51.355769 28.947384 28.329157 21.877124 17.906090 12.477377 9.518529 8.309032 3.781910 0.000000
  • 28. Split data to training and test data • inTrain <- createDataPartition(y = all_ev[y][,1], p = 0.7, list = FALSE) • train <- all_ev[inTrain, ] • test <- all_ev[-inTrain, ] • training set: 70% of companies • hold-out test set: 30% of companies • R caret package createDataPartition() function was used to split data
  • 29. Model training using 5-fold cross validation rf_model <- train(y ~ ., data = train, method = "rf", trControl = trainControl( method = "cv", number = 5 ), metric = "ROC", tuneGrid = expand.grid(mtry = c(2, 4, 6, 8)) ) • R caret package train() and trainControl() functions do the job
  • 30. Model evaluation mtry <- rf_model$bestTune$mtry train_auc <- rf_model$results$ROC[as.numeric(rownames(rf_model$bestTune))] • model AUC on training data • model AUC on test data score <- predict(rf_model, newdata = test, type = "prob") pred <- prediction(score[, 2], test[y]) test_auc <- performance(pred, "auc") • AUC on training data 0.82..0.88 • AUC on test data 0.83..0.88 • Benchmark (decision tree) AUC = 0.7
  • 31. Daily training and prediction • Model is trained daily Model training date-1 month-2 months Companies that started 30 day trials here Moving training data window • Predictions are calculated daily for all companies in trial
  • 32. Model and predictions traceability • Use cases • Monitoring processes • Explaining prediction results • Relevant data has to be saved for traceability
  • 33. Model training traceability • All model instances are saved in S3 • The following training data is saved in DB • training timestamp • model location • mtry • train_auc • test_auc • n_test • n_train • feature importances • model training duration
  • 34. Predictions traceability • The following data is saved • prediction timestamp • model id • company id • predicted trial success likelihood • feature values used in prediction
  • 35. Monitoring Re:dash is used for dashboards redash.io
  • 37. training AUC vs test AUC
  • 40. mtry