SlideShare una empresa de Scribd logo
1 de 33
Descargar para leer sin conexión
Recruiting Solutions 
1 
My Three Ex’s: 
A Data Science Approach for 
Applied Machine Learning
Dedicated to 3 of my favorite ex co-workers.
First, a disclosure. 
This isn’t a talk about machine learning. 
It’s a talk about applying machine learning. 
What’s the difference? 
3
Let’s talk about something else for a moment. 
Hash Tables 
4
What you (need to) know about hash tables. 
Theory Application 
5 
Class HashMap<K,V> 
java.lang.Object 
java.util.AbstractMap<K,V> 
java.util.HashMap<K,V> 
Type Parameters: 
K - the type of keys maintained by this map 
V - the type of mapped values 
All Implemented Interfaces: 
Serializable, Cloneable, Map<K,V>
Now let’s get back to machine learning! 
6
Please allow me to introduce my three ex’s. 
Express. 
Explain. 
Experiment. 
7
Embrace the data science mindset. 
Express 
Understand your utility and inputs. 
Explain 
Understand your models and metrics. 
Experiment 
Optimize for the speed of learning. 
8
Express. 
9
How to train your machine learning model. 
1. Define your objective function. 
2. Collect training data. 
3. Build models. 
4. Profit! 
10
You can only improve what you measure. 
11 
Clicks? 
Actions? 
Outcomes?
Be careful how you define precision… 
12
Account for non-uniform inputs and costs. 
13
Stratified sampling is your friend. 
14
An example of segmenting models. 
15 
Searcher: Recruiter 
Query: Person Name 
Searcher: Job Seeker 
Query: Person Name 
Searcher: Recruiter 
Query: Job Title 
Searcher: Job Seeker 
Query: Job Title
Express yourself in your feature vectors. 
16
Express: Summary. 
 Choose an objective function that models utility. 
 Be careful how you define precision. 
 Account for non-uniform inputs and costs. 
 Stratified sampling is your friend. 
 Express yourself in your feature vectors. 
17
Explain. 
18
With apologies to the little prince. 
19
Everyone is talking about Deep Learning. 
20
But accuracy isn’t everything. 
21
Explainable models, explainable features. 
 Less is more when it comes to explainability. 
 Algorithms can protect you from overfitting, but they can’t 
protect you from the biases you introduce. 
 Introspection into your models and features makes it 
easier for you and others to debug them. 
 Especially if you don’t completely trust your objective 
function or the representativeness of your training data. 
22
Linear regression? Decision trees? 
 Linear regression and decision trees favor explainability 
over accuracy, compared to more sophisticated models. 
 But size matters. If you have too many features or too 
deep a decision tree, you lose explainability. 
 You can always upgrade to a more sophisticated model 
when you trust your objective function and training data. 
 Build a machine learning model is an iterative process. 
Optimize for the speed of your own learning. 
23
Explain: Summary. 
 Accuracy isn’t everything. 
 Less is more when it comes to explainability. 
 Don’t knock linear models and decision trees! 
 Start with simple models, then upgrade. 
24
Experiment. 
25
Why experiments matter. 
“You have to kiss a lot of frogs to find one prince. 
So how can you find your prince faster? 
By finding more frogs and kissing them faster and 
faster.” 
-- Mike Moran 
26
Life in the age of big data. 
Yesterday Today 
27 
Experiments are expensive, 
choose hypotheses wisely. 
Experiments are cheap, 
do as many as you can!
So should we just test everything? 
28
Optimize for the speed of learning. 
29 
vs
Be disciplined: test one variable at a time. 
• Autocomplete 
• Entity Tagging 
• Vertical Intent 
• # of Suggestions 
• Suggestion Order 
• Language 
• Query Construction 
• Ranking Model 
30
Experiment: Summary. 
 Kiss lots of frogs: experiments are cheap. 
 But test in good faith – don’t just flip coins. 
 Optimize for the speed of learning. 
 Be disciplined: test one variable at a time. 
31
Bringing it all together. 
Express 
Understand your utility and inputs. 
Explain 
Understand your models and metrics. 
Experiment 
Optimize for the speed of learning. 
32
33 
Daniel Tunkelang 
dtunkelang@linkedin.com 
https://linkedin.com/in/dtunkelang

Más contenido relacionado

La actualidad más candente

The path to be a data scientist
The path to be a data scientistThe path to be a data scientist
The path to be a data scientistPoo Kuan Hoong
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningKoundinya Desiraju
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningTamir Taha
 
Machine Learning Landscape
Machine Learning LandscapeMachine Learning Landscape
Machine Learning LandscapeEng Teong Cheah
 
Probabilistic programming
Probabilistic programmingProbabilistic programming
Probabilistic programmingEli Gottlieb
 
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019Dhiana Deva
 
Claudia Gold: Learning Data Science Online
Claudia Gold: Learning Data Science OnlineClaudia Gold: Learning Data Science Online
Claudia Gold: Learning Data Science Onlinesfdatascience
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceNiko Vuokko
 
Probabilistic Programming in Python
Probabilistic Programming in PythonProbabilistic Programming in Python
Probabilistic Programming in PythonPeadar Coyle
 
Qualtrics and MaxDiff Analysis: Understanding True Customer Preference Rankings
Qualtrics and MaxDiff Analysis: Understanding True Customer Preference RankingsQualtrics and MaxDiff Analysis: Understanding True Customer Preference Rankings
Qualtrics and MaxDiff Analysis: Understanding True Customer Preference RankingsQualtrics
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientistryanorban
 
Data Science Methodology for Analytics and Solution Implementation
Data Science Methodology for Analytics and Solution ImplementationData Science Methodology for Analytics and Solution Implementation
Data Science Methodology for Analytics and Solution ImplementationRupak Roy
 
Writing Smarter Applications with Machine Learning
Writing Smarter Applications with Machine LearningWriting Smarter Applications with Machine Learning
Writing Smarter Applications with Machine LearningAnoop Thomas Mathew
 
MLSEV Virtual. Evaluations
MLSEV Virtual. EvaluationsMLSEV Virtual. Evaluations
MLSEV Virtual. EvaluationsBigML, Inc
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data ScientistDaniel Tunkelang
 
MLSEV Virtual. Supervised vs Unsupervised
MLSEV Virtual. Supervised vs UnsupervisedMLSEV Virtual. Supervised vs Unsupervised
MLSEV Virtual. Supervised vs UnsupervisedBigML, Inc
 
Clare Corthell: Learning Data Science Online
Clare Corthell: Learning Data Science OnlineClare Corthell: Learning Data Science Online
Clare Corthell: Learning Data Science Onlinesfdatascience
 
How to become a data scientist
How to become a data scientistHow to become a data scientist
How to become a data scientistDeZyre
 

La actualidad más candente (20)

The path to be a data scientist
The path to be a data scientistThe path to be a data scientist
The path to be a data scientist
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
Jsm big-data
Jsm big-dataJsm big-data
Jsm big-data
 
Machine Learning Landscape
Machine Learning LandscapeMachine Learning Landscape
Machine Learning Landscape
 
Probabilistic programming
Probabilistic programmingProbabilistic programming
Probabilistic programming
 
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
 
Claudia Gold: Learning Data Science Online
Claudia Gold: Learning Data Science OnlineClaudia Gold: Learning Data Science Online
Claudia Gold: Learning Data Science Online
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Probabilistic Programming in Python
Probabilistic Programming in PythonProbabilistic Programming in Python
Probabilistic Programming in Python
 
Qualtrics and MaxDiff Analysis: Understanding True Customer Preference Rankings
Qualtrics and MaxDiff Analysis: Understanding True Customer Preference RankingsQualtrics and MaxDiff Analysis: Understanding True Customer Preference Rankings
Qualtrics and MaxDiff Analysis: Understanding True Customer Preference Rankings
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
 
Guide: MaxDiff
Guide: MaxDiffGuide: MaxDiff
Guide: MaxDiff
 
Data Science Methodology for Analytics and Solution Implementation
Data Science Methodology for Analytics and Solution ImplementationData Science Methodology for Analytics and Solution Implementation
Data Science Methodology for Analytics and Solution Implementation
 
Writing Smarter Applications with Machine Learning
Writing Smarter Applications with Machine LearningWriting Smarter Applications with Machine Learning
Writing Smarter Applications with Machine Learning
 
MLSEV Virtual. Evaluations
MLSEV Virtual. EvaluationsMLSEV Virtual. Evaluations
MLSEV Virtual. Evaluations
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data Scientist
 
MLSEV Virtual. Supervised vs Unsupervised
MLSEV Virtual. Supervised vs UnsupervisedMLSEV Virtual. Supervised vs Unsupervised
MLSEV Virtual. Supervised vs Unsupervised
 
Clare Corthell: Learning Data Science Online
Clare Corthell: Learning Data Science OnlineClare Corthell: Learning Data Science Online
Clare Corthell: Learning Data Science Online
 
How to become a data scientist
How to become a data scientistHow to become a data scientist
How to become a data scientist
 

Similar a My Three Ex’s: A Data Science Approach for Applied Machine Learning

ML crash course
ML crash courseML crash course
ML crash coursemikaelhuss
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Alok Singh
 
Machine Learning basics
Machine Learning basicsMachine Learning basics
Machine Learning basicsNeeleEilers
 
Lessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsLessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsXavier Amatriain
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018HJ van Veen
 
National STEM League - Student Goals and Academic Glue
National STEM League - Student Goals and Academic GlueNational STEM League - Student Goals and Academic Glue
National STEM League - Student Goals and Academic GlueNAFCareerAcads
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9Roger Barga
 
Human in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AIHuman in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AIPramit Choudhary
 
Machine learning para tertulianos, by javier ramirez at teowaki
Machine learning para tertulianos, by javier ramirez at teowakiMachine learning para tertulianos, by javier ramirez at teowaki
Machine learning para tertulianos, by javier ramirez at teowakijavier ramirez
 
Andrew NG machine learning
Andrew NG machine learningAndrew NG machine learning
Andrew NG machine learningShareDocView.com
 
The importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systemsThe importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systemsFrancesca Lazzeri, PhD
 
Data Science Accelerator Program
Data Science Accelerator ProgramData Science Accelerator Program
Data Science Accelerator ProgramGoDataDriven
 
Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Learning to Learn Model Behavior ( Capital One: data intelligence conference )Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Learning to Learn Model Behavior ( Capital One: data intelligence conference )Pramit Choudhary
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningAkshay Kanchan
 
An introduction to machine learning and statistics
An introduction to machine learning and statisticsAn introduction to machine learning and statistics
An introduction to machine learning and statisticsSpotle.ai
 
Machine Learning and Deep Learning from Foundations to Applications Excel, R,...
Machine Learning and Deep Learning from Foundations to Applications Excel, R,...Machine Learning and Deep Learning from Foundations to Applications Excel, R,...
Machine Learning and Deep Learning from Foundations to Applications Excel, R,...Narendra Ashar
 
Keepler Data Tech | Entendiendo tus propios modelos predictivos
Keepler Data Tech | Entendiendo tus propios modelos predictivosKeepler Data Tech | Entendiendo tus propios modelos predictivos
Keepler Data Tech | Entendiendo tus propios modelos predictivosKeepler Data Tech
 
9.b-CMPS 403-F20-Session 9-Intro to ML II.pdf
9.b-CMPS 403-F20-Session 9-Intro to ML II.pdf9.b-CMPS 403-F20-Session 9-Intro to ML II.pdf
9.b-CMPS 403-F20-Session 9-Intro to ML II.pdfAmirMohamedNabilSale
 

Similar a My Three Ex’s: A Data Science Approach for Applied Machine Learning (20)

ML crash course
ML crash courseML crash course
ML crash course
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
 
Machine Learning basics
Machine Learning basicsMachine Learning basics
Machine Learning basics
 
Lessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsLessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systems
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
lec1.ppt
lec1.pptlec1.ppt
lec1.ppt
 
National STEM League - Student Goals and Academic Glue
National STEM League - Student Goals and Academic GlueNational STEM League - Student Goals and Academic Glue
National STEM League - Student Goals and Academic Glue
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9
 
Human in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AIHuman in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AI
 
Machine learning para tertulianos, by javier ramirez at teowaki
Machine learning para tertulianos, by javier ramirez at teowakiMachine learning para tertulianos, by javier ramirez at teowaki
Machine learning para tertulianos, by javier ramirez at teowaki
 
Andrew NG machine learning
Andrew NG machine learningAndrew NG machine learning
Andrew NG machine learning
 
The importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systemsThe importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systems
 
Data Science Accelerator Program
Data Science Accelerator ProgramData Science Accelerator Program
Data Science Accelerator Program
 
Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Learning to Learn Model Behavior ( Capital One: data intelligence conference )Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Learning to Learn Model Behavior ( Capital One: data intelligence conference )
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
An introduction to machine learning and statistics
An introduction to machine learning and statisticsAn introduction to machine learning and statistics
An introduction to machine learning and statistics
 
Machine Learning and Deep Learning from Foundations to Applications Excel, R,...
Machine Learning and Deep Learning from Foundations to Applications Excel, R,...Machine Learning and Deep Learning from Foundations to Applications Excel, R,...
Machine Learning and Deep Learning from Foundations to Applications Excel, R,...
 
Keepler Data Tech | Entendiendo tus propios modelos predictivos
Keepler Data Tech | Entendiendo tus propios modelos predictivosKeepler Data Tech | Entendiendo tus propios modelos predictivos
Keepler Data Tech | Entendiendo tus propios modelos predictivos
 
9.b-CMPS 403-F20-Session 9-Intro to ML II.pdf
9.b-CMPS 403-F20-Session 9-Intro to ML II.pdf9.b-CMPS 403-F20-Session 9-Intro to ML II.pdf
9.b-CMPS 403-F20-Session 9-Intro to ML II.pdf
 

Más de Daniel Tunkelang

Query Understanding and Ecommerce
Query Understanding and EcommerceQuery Understanding and Ecommerce
Query Understanding and EcommerceDaniel Tunkelang
 
Semantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce QueriesSemantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce QueriesDaniel Tunkelang
 
Helping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query UnderstandingHelping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query UnderstandingDaniel Tunkelang
 
Query Understanding: A Manifesto
Query Understanding: A ManifestoQuery Understanding: A Manifesto
Query Understanding: A ManifestoDaniel Tunkelang
 
Where should you put your data scientists?
Where should you put your data scientists?Where should you put your data scientists?
Where should you put your data scientists?Daniel Tunkelang
 
Better Search Through Query Understanding
Better Search Through Query UnderstandingBetter Search Through Query Understanding
Better Search Through Query UnderstandingDaniel Tunkelang
 
Find and be Found: Information Retrieval at LinkedIn
Find and be Found: Information Retrieval at LinkedInFind and be Found: Information Retrieval at LinkedIn
Find and be Found: Information Retrieval at LinkedInDaniel Tunkelang
 
Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Daniel Tunkelang
 
Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Daniel Tunkelang
 
Information, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of NeedsInformation, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of NeedsDaniel Tunkelang
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The PeopleDaniel Tunkelang
 
Content, Connections, and Context
Content, Connections, and ContextContent, Connections, and Context
Content, Connections, and ContextDaniel Tunkelang
 
Scale, Structure, and Semantics
Scale, Structure, and SemanticsScale, Structure, and Semantics
Scale, Structure, and SemanticsDaniel Tunkelang
 
Strata 2012: Humans, Machines, and the Dimensions of Microwork
Strata 2012: Humans, Machines, and the Dimensions of MicroworkStrata 2012: Humans, Machines, and the Dimensions of Microwork
Strata 2012: Humans, Machines, and the Dimensions of MicroworkDaniel Tunkelang
 
Recommendations as a Conversation with the User
Recommendations as a Conversation with the UserRecommendations as a Conversation with the User
Recommendations as a Conversation with the UserDaniel Tunkelang
 
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedInKeeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedInDaniel Tunkelang
 
The War on Attention Poverty: Measuring Twitter Authority
The War on Attention Poverty: Measuring Twitter AuthorityThe War on Attention Poverty: Measuring Twitter Authority
The War on Attention Poverty: Measuring Twitter AuthorityDaniel Tunkelang
 

Más de Daniel Tunkelang (20)

Query Understanding and Ecommerce
Query Understanding and EcommerceQuery Understanding and Ecommerce
Query Understanding and Ecommerce
 
Semantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce QueriesSemantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce Queries
 
Helping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query UnderstandingHelping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query Understanding
 
MMM, Search!
MMM, Search!MMM, Search!
MMM, Search!
 
Enterprise Intelligence
Enterprise IntelligenceEnterprise Intelligence
Enterprise Intelligence
 
Query Understanding: A Manifesto
Query Understanding: A ManifestoQuery Understanding: A Manifesto
Query Understanding: A Manifesto
 
Where should you put your data scientists?
Where should you put your data scientists?Where should you put your data scientists?
Where should you put your data scientists?
 
Better Search Through Query Understanding
Better Search Through Query UnderstandingBetter Search Through Query Understanding
Better Search Through Query Understanding
 
Find and be Found: Information Retrieval at LinkedIn
Find and be Found: Information Retrieval at LinkedInFind and be Found: Information Retrieval at LinkedIn
Find and be Found: Information Retrieval at LinkedIn
 
Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?
 
Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem
 
Information, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of NeedsInformation, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of Needs
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
 
Content, Connections, and Context
Content, Connections, and ContextContent, Connections, and Context
Content, Connections, and Context
 
Scale, Structure, and Semantics
Scale, Structure, and SemanticsScale, Structure, and Semantics
Scale, Structure, and Semantics
 
Strata 2012: Humans, Machines, and the Dimensions of Microwork
Strata 2012: Humans, Machines, and the Dimensions of MicroworkStrata 2012: Humans, Machines, and the Dimensions of Microwork
Strata 2012: Humans, Machines, and the Dimensions of Microwork
 
Recommendations as a Conversation with the User
Recommendations as a Conversation with the UserRecommendations as a Conversation with the User
Recommendations as a Conversation with the User
 
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedInKeeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
 
The War on Attention Poverty: Measuring Twitter Authority
The War on Attention Poverty: Measuring Twitter AuthorityThe War on Attention Poverty: Measuring Twitter Authority
The War on Attention Poverty: Measuring Twitter Authority
 
Design for Interaction
Design for InteractionDesign for Interaction
Design for Interaction
 

Último

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 

Último (20)

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 

My Three Ex’s: A Data Science Approach for Applied Machine Learning

  • 1. Recruiting Solutions 1 My Three Ex’s: A Data Science Approach for Applied Machine Learning
  • 2. Dedicated to 3 of my favorite ex co-workers.
  • 3. First, a disclosure. This isn’t a talk about machine learning. It’s a talk about applying machine learning. What’s the difference? 3
  • 4. Let’s talk about something else for a moment. Hash Tables 4
  • 5. What you (need to) know about hash tables. Theory Application 5 Class HashMap<K,V> java.lang.Object java.util.AbstractMap<K,V> java.util.HashMap<K,V> Type Parameters: K - the type of keys maintained by this map V - the type of mapped values All Implemented Interfaces: Serializable, Cloneable, Map<K,V>
  • 6. Now let’s get back to machine learning! 6
  • 7. Please allow me to introduce my three ex’s. Express. Explain. Experiment. 7
  • 8. Embrace the data science mindset. Express Understand your utility and inputs. Explain Understand your models and metrics. Experiment Optimize for the speed of learning. 8
  • 10. How to train your machine learning model. 1. Define your objective function. 2. Collect training data. 3. Build models. 4. Profit! 10
  • 11. You can only improve what you measure. 11 Clicks? Actions? Outcomes?
  • 12. Be careful how you define precision… 12
  • 13. Account for non-uniform inputs and costs. 13
  • 14. Stratified sampling is your friend. 14
  • 15. An example of segmenting models. 15 Searcher: Recruiter Query: Person Name Searcher: Job Seeker Query: Person Name Searcher: Recruiter Query: Job Title Searcher: Job Seeker Query: Job Title
  • 16. Express yourself in your feature vectors. 16
  • 17. Express: Summary.  Choose an objective function that models utility.  Be careful how you define precision.  Account for non-uniform inputs and costs.  Stratified sampling is your friend.  Express yourself in your feature vectors. 17
  • 19. With apologies to the little prince. 19
  • 20. Everyone is talking about Deep Learning. 20
  • 21. But accuracy isn’t everything. 21
  • 22. Explainable models, explainable features.  Less is more when it comes to explainability.  Algorithms can protect you from overfitting, but they can’t protect you from the biases you introduce.  Introspection into your models and features makes it easier for you and others to debug them.  Especially if you don’t completely trust your objective function or the representativeness of your training data. 22
  • 23. Linear regression? Decision trees?  Linear regression and decision trees favor explainability over accuracy, compared to more sophisticated models.  But size matters. If you have too many features or too deep a decision tree, you lose explainability.  You can always upgrade to a more sophisticated model when you trust your objective function and training data.  Build a machine learning model is an iterative process. Optimize for the speed of your own learning. 23
  • 24. Explain: Summary.  Accuracy isn’t everything.  Less is more when it comes to explainability.  Don’t knock linear models and decision trees!  Start with simple models, then upgrade. 24
  • 26. Why experiments matter. “You have to kiss a lot of frogs to find one prince. So how can you find your prince faster? By finding more frogs and kissing them faster and faster.” -- Mike Moran 26
  • 27. Life in the age of big data. Yesterday Today 27 Experiments are expensive, choose hypotheses wisely. Experiments are cheap, do as many as you can!
  • 28. So should we just test everything? 28
  • 29. Optimize for the speed of learning. 29 vs
  • 30. Be disciplined: test one variable at a time. • Autocomplete • Entity Tagging • Vertical Intent • # of Suggestions • Suggestion Order • Language • Query Construction • Ranking Model 30
  • 31. Experiment: Summary.  Kiss lots of frogs: experiments are cheap.  But test in good faith – don’t just flip coins.  Optimize for the speed of learning.  Be disciplined: test one variable at a time. 31
  • 32. Bringing it all together. Express Understand your utility and inputs. Explain Understand your models and metrics. Experiment Optimize for the speed of learning. 32
  • 33. 33 Daniel Tunkelang dtunkelang@linkedin.com https://linkedin.com/in/dtunkelang