SlideShare una empresa de Scribd logo
1 de 33
Machine Learning for
Recommender
Systems in the Job
Markethamburg.ai, May 2017
Fabian Abel
Challenge
Given a user, the goal is to recommend job postings…
1. that the user may be interested in and
2. for which the user is an appropriate candidate.
2
Scala Dev
(m/w)
Scala
Engineer
Scala Dev,
Hamburg
user
job postings
Job
recommende
r
companie
s
recruiter
19M
750k-1M
3
Goals / Triangle of contradiction
Scala Dev,
Hamburg
• Relevant recos
• No spam
• Relevant
candidates
• High reach
• Happy customers
• High revenue (e.g. many
clicks on paid content)
companie
s
user
Job recommendations
5
mobile email
Job recommendations
Job recommendations
8
9
Job Recommender REST Service
GET /rest/recommendations/jobs/user/42
//response:
{
"total": 20,
"collection":[
{"item_id": 7263, "score": 0.87, "reason": [..],..},
{"item_id": 6526, "score": 0.81, "reason": [..],..},
...
]
}
10
Search indices
XING
Sources/XINGservices
MySQL
NoSQL
live
updates
Batch processing
batch
updates
Infrastructure for recommenders
RecommenderRESTservice
XING
Products
Deployment Infrastructure
11
Search indices
XING
Sources/XINGservices
MySQL
NoSQL
live
updates
Batch processing
batch
updates
Infrastructure for recommenders
RecommenderRESTservice
XING
Products
Deployment Infrastructure
12
Title
Company
Employment type
and career level
Full-text
description
Key properties of a job posting
13
Key sources for understanding user demands
Social Network
explicit and
implicit
connections
Profile
Fabian Abel
Data Scientist
Haves:
Interests:
web science
big data, hadoop
skills & co.
Interactions
data
web
social media
clicks, bookmarks,
ratings, shown
big data
kununu
Interactions of
similar users
similar usershadoop
scala
14
Relevance Estimation
Social Network
explicit and
implicit
connections
Profile
Fabian Abel
Data Scientist
Haves:
Interests:
web science
big data, hadoop
skills & co.
Interactions
data
web
social media
clicks, bookmarks,
ratings
big data
kununu
Interactions of
similar users
similar usershadoop
scala
Content-
based
features
Collaborative
features
Social
features
Usage
behavior
features
Core
RecSys
engines
(regression model)
Logistic Regression
P(relevant | x) =
1
1 + e-(b0 + bi xi)i
n
feature vector impact of feature xi
15
Relevance Estimation + Additional Filters
Content-
based
features
Collaborative
features
Social
features
Usage
behavior
features
Core
RecSys
engines
(regression model)
Location-
based
filtering
Frequenty
Shown
Filtering
Monetary-
based
diversification
Career Level
filtering
Filtering &
Diversification
0.92 0.8 0.76
…
4 core sub-recommender
engines and 19 filters that
together analyze and exploit
around 200 features
(relevance criteria)
...
16
Collaborative filtering
Theory: User-based and Item-based CF
User-Item-Rating Matrix
Anna
3 - 4 - 2
Julia
2 - 5 4 1
Tim
4 3 - 5 1
John
- 4 5 4 -
Java D. SAP Co Data En Data Sc BI Dev
User-based CF:
 Compare users based on their
ratings (e.g. cosine sim.)
 Use the n most similar users to
predict a rating on an item
Item-based CF:
 Compare items based on their
ratings (e.g. cosine sim.)
 Use the n most similar items to
predict a rating from a user
(simple weight average)
17
Collaborative filtering
Reality: Ultra sparse User-Item Matrix and primarily implicit feedback
Anna
- - 1 - -
Julia
- - - - -
Tim
- - - - -
John
1 - - - -
Java D. SAP Co Data En Data Sc BI Dev
High level of sparsity:
classical collaborative
fitering (or matrix
factorization) does not
work
18
Collaborative filtering
Reality: Ultra sparse User-Item Matrix and primarily implicit feedback
Anna
- - 1 - -
Data
Sci
- - 32 18 -
Tim
524 3 1 - -
John
- - 2 4 -
Java D. SAP Co Data En Data Sc BI Dev
Data
Scientists
Skilled
in Java
BI Dev
Pseudo CF:
 Cluster users based on...
 jobrole
 skills
 field of study
 Recommend items that simillar
users (= clusters) interacted with
New item problem remains...
19
Content-based filtering
Example: semantic search
Fabian Abel
Data Mining Expert
Haves:
Interests:
ML, j2ee
Hadoop
Raw profile Ontology-based
Data Scientist
Synonyms: Data Mining Expert, Data
Mining Specialist, …
6940
263
JEE
Synonyms:
J2EE, Java
Enterprise, …
370
Computer Science
Synonyms: Informatik, Comp.
Sci., CS, …
162
Hadoop
Synonyms:
Apache
Hadoop, …
473
Machine
Learning
Synonyms:
Maschinelles Lernen,
…
[jobrole]
[skills]
[field of studies]
Education: Computer Sci.
query
TFxIDF
20
Content-based filtering
Example: more-like-this component
Anna
Bookmarked, rated
and applied-to job
postings
1 2 3
q = trans( 1 2 3 )
Recommending
similar items
q
7 8 9
R =
8
9
7
TFxIDF
Re-rank by similarity of
topic model vectors:
R’ map { r =>
val x = B’ map { b =>
cosineSim(r, b)
}
r -> x.sum / x.size
} sortBy(-_._2)
8
7
9 Re-ranking:
- LSI
- Word2Vec
Topic model
vector
representations
1 2
7 8
3
9
1 2
8
3
97R’=
B’=
=B
=R
21
Content-based filtering
Example: more-like-this component
CTR
TFxIDF
LSI-based
re-ranking
+3.2% +3.1%
Word2Vec-based
Re-ranking
Challenges
Issues that we have to fight with…
22
23
Profiles vs. People’s wishes for their
future
past
past
Profile describes a
user‘s past/current
position(s), not future
wishes
What John writes…
24
And what he means…
Recruiter-John
International Sales Manager Call Center Agent
(10 EUR per hour)
Sales Manager Sales Manager for B2B
customers
(80K EUR per year)
Data Scientist skilled in Hadoop,
Scala, Elasticsearch, … with PhD in …
Data Analyst
(skilled in SAS or Excel)
What Paul says he is…
25
And what he means…
Paul, the Candidate
CEO Network Engineer
(currently unemployed)
BI Engineer
(skilled in old-school ETL)
Shopman
(in a kiosk)
Data Scientist with 100+ skills
Sales Manager
26
Understanding the meaning of things that recruiters
write in job postings and users write in their profiles is
not trivial…
27
People freak out if we
recommend
something wrong!
 Try to eliminate
freakommendations
(outliers)
Outlier Filtering
Core
RecSys
engines
Location Filter
Outlier filterFiltering &
Diversification
0.92 0.8 0.76
…
Career level
Filter
...
…
...
…
2. Filter:
if (r > threshold) keep
else drop
1. predictRating( , )
= predict(toFeatureVec( , )
= r //rating between 1 and 5
Estimate how a user
would rate the
item…
(training: 750k
explicit ratings)
good recos
bad recos
Percentageoffiltereduser-job
postingpairsbyrating
threshold
29
Example: with a threshold of 2.5 we kill 86% of the bad and 18% of the good recos
Outlier Filtering
The “filter onion”: trade-off between killing bad recos
and keeping good ones
• xgboost-based model
• Example features (137 features in total):
• Matching & weighting: jobrole, skills, discipline, industry, ...
• Distance: home location / job seeker location
• Transitions: job role  job role, field of study  job role
• ...
30
Outlier Filtering
Example features (137 features in total)
Outlier Filtering
Some A/B test results: user success
31
filtering
userswithrecos
no filtering
-10.9%
+7.4%
userswhoclickedonrecos no filtering filtering
Less people get recommendations,
but more users click!
 Stricter filtering pays off!
ACM RecSys Challenge
http::/recsyschallenge.com
32
 Task: push recommendations (new items, paid vs. non-
paid, premium vs. basic users)
 Started beginning of March (ca. 240 teams so far), ends in
June
 Offline & online evaluation
 Still possible to sign-up for the offline evaluation…
Thank you
http://2017.recsyschallenge.com
@fabianabel

Más contenido relacionado

La actualidad más candente

"Getting More from Your Datasets: Data Augmentation, Annotation and Generativ...
"Getting More from Your Datasets: Data Augmentation, Annotation and Generativ..."Getting More from Your Datasets: Data Augmentation, Annotation and Generativ...
"Getting More from Your Datasets: Data Augmentation, Annotation and Generativ...
Edge AI and Vision Alliance
 
generative-ai-fundamentals and Large language models
generative-ai-fundamentals and Large language modelsgenerative-ai-fundamentals and Large language models
generative-ai-fundamentals and Large language models
AdventureWorld5
 

La actualidad más candente (20)

"Getting More from Your Datasets: Data Augmentation, Annotation and Generativ...
"Getting More from Your Datasets: Data Augmentation, Annotation and Generativ..."Getting More from Your Datasets: Data Augmentation, Annotation and Generativ...
"Getting More from Your Datasets: Data Augmentation, Annotation and Generativ...
 
3D Internet
3D Internet3D Internet
3D Internet
 
Survival Analysis of Web Users
Survival Analysis of Web UsersSurvival Analysis of Web Users
Survival Analysis of Web Users
 
Tangible Interaction & Interfaces
Tangible Interaction & InterfacesTangible Interaction & Interfaces
Tangible Interaction & Interfaces
 
Image processing using labview
Image processing using labviewImage processing using labview
Image processing using labview
 
Alpha zero - London 2018
Alpha zero  - London 2018 Alpha zero  - London 2018
Alpha zero - London 2018
 
Graph Neural Network (한국어)
Graph Neural Network (한국어)Graph Neural Network (한국어)
Graph Neural Network (한국어)
 
Artificial Intelligence in Medicine and Healthcare
Artificial Intelligence in Medicine and HealthcareArtificial Intelligence in Medicine and Healthcare
Artificial Intelligence in Medicine and Healthcare
 
Harry Surden - Artificial Intelligence and Law Overview
Harry Surden - Artificial Intelligence and Law OverviewHarry Surden - Artificial Intelligence and Law Overview
Harry Surden - Artificial Intelligence and Law Overview
 
Reinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersReinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face Transformers
 
Understanding casual games
Understanding casual gamesUnderstanding casual games
Understanding casual games
 
BLUE BRAIN
BLUE BRAINBLUE BRAIN
BLUE BRAIN
 
GDC Talk - Nature vs Nurture: Unpacking Player Spending in F2P Games
GDC Talk - Nature vs Nurture: Unpacking Player Spending in F2P GamesGDC Talk - Nature vs Nurture: Unpacking Player Spending in F2P Games
GDC Talk - Nature vs Nurture: Unpacking Player Spending in F2P Games
 
Landscape of AI/ML in 2023
Landscape of AI/ML in 2023Landscape of AI/ML in 2023
Landscape of AI/ML in 2023
 
Food Recommendation System Using Clustering Analysis for Diabetic patients
Food Recommendation System Using Clustering Analysis for Diabetic patientsFood Recommendation System Using Clustering Analysis for Diabetic patients
Food Recommendation System Using Clustering Analysis for Diabetic patients
 
10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems
 
Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)
 
Cohort Analysis at Scale
Cohort Analysis at ScaleCohort Analysis at Scale
Cohort Analysis at Scale
 
generative-ai-fundamentals and Large language models
generative-ai-fundamentals and Large language modelsgenerative-ai-fundamentals and Large language models
generative-ai-fundamentals and Large language models
 
Serious Games
Serious GamesSerious Games
Serious Games
 

Similar a Machine Learning for Recommender Systems in the Job Market

What's wrong with Recruiter-John? A non-trivial recommender challenge.
What's wrong with Recruiter-John? A non-trivial recommender challenge.What's wrong with Recruiter-John? A non-trivial recommender challenge.
What's wrong with Recruiter-John? A non-trivial recommender challenge.
Fabian Abel
 
DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19
Yong Siang (Ivan) Tan
 
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Gabriel Moreira
 
Discovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsDiscovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender Systems
Gabriel Moreira
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
Paco Nathan
 

Similar a Machine Learning for Recommender Systems in the Job Market (20)

Telecom datascience master_public
Telecom datascience master_publicTelecom datascience master_public
Telecom datascience master_public
 
What's wrong with Recruiter-John? A non-trivial recommender challenge.
What's wrong with Recruiter-John? A non-trivial recommender challenge.What's wrong with Recruiter-John? A non-trivial recommender challenge.
What's wrong with Recruiter-John? A non-trivial recommender challenge.
 
DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19
 
Recommending job ads to people
Recommending job ads to peopleRecommending job ads to people
Recommending job ads to people
 
Data Science Demystified
Data Science DemystifiedData Science Demystified
Data Science Demystified
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent EngineLeveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
 
Fully Automated QA System For Large Scale Search And Recommendation Engines U...
Fully Automated QA System For Large Scale Search And Recommendation Engines U...Fully Automated QA System For Large Scale Search And Recommendation Engines U...
Fully Automated QA System For Large Scale Search And Recommendation Engines U...
 
Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015
 
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
 
Discovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsDiscovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender Systems
 
Business Applications of Predictive Modeling at Scale - KDD 2016 Tutorial
Business Applications of Predictive Modeling at Scale - KDD 2016 TutorialBusiness Applications of Predictive Modeling at Scale - KDD 2016 Tutorial
Business Applications of Predictive Modeling at Scale - KDD 2016 Tutorial
 
Building Intelligent Workplace Limits and Challenges RIGA COMM 2023
Building Intelligent Workplace Limits and Challenges RIGA COMM 2023 Building Intelligent Workplace Limits and Challenges RIGA COMM 2023
Building Intelligent Workplace Limits and Challenges RIGA COMM 2023
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
 
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFTed Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
 
Hadoop dev 01
Hadoop dev 01Hadoop dev 01
Hadoop dev 01
 
Business Applications of Predictive Modeling at Scale
Business Applications of Predictive Modeling at ScaleBusiness Applications of Predictive Modeling at Scale
Business Applications of Predictive Modeling at Scale
 
ONA (organizational network analysis) - enabling individuals to impact their ...
ONA (organizational network analysis) - enabling individuals to impact their ...ONA (organizational network analysis) - enabling individuals to impact their ...
ONA (organizational network analysis) - enabling individuals to impact their ...
 
Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015
 
How to Be a 10x Data Scientist
How to Be a 10x Data Scientist How to Be a 10x Data Scientist
How to Be a 10x Data Scientist
 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
 

Último

Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 

Último (20)

Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 

Machine Learning for Recommender Systems in the Job Market

  • 1. Machine Learning for Recommender Systems in the Job Markethamburg.ai, May 2017 Fabian Abel
  • 2. Challenge Given a user, the goal is to recommend job postings… 1. that the user may be interested in and 2. for which the user is an appropriate candidate. 2 Scala Dev (m/w) Scala Engineer Scala Dev, Hamburg user job postings Job recommende r companie s recruiter 19M 750k-1M
  • 3. 3 Goals / Triangle of contradiction Scala Dev, Hamburg • Relevant recos • No spam • Relevant candidates • High reach • Happy customers • High revenue (e.g. many clicks on paid content) companie s user
  • 8. 8
  • 9. 9 Job Recommender REST Service GET /rest/recommendations/jobs/user/42 //response: { "total": 20, "collection":[ {"item_id": 7263, "score": 0.87, "reason": [..],..}, {"item_id": 6526, "score": 0.81, "reason": [..],..}, ... ] }
  • 10. 10 Search indices XING Sources/XINGservices MySQL NoSQL live updates Batch processing batch updates Infrastructure for recommenders RecommenderRESTservice XING Products Deployment Infrastructure
  • 11. 11 Search indices XING Sources/XINGservices MySQL NoSQL live updates Batch processing batch updates Infrastructure for recommenders RecommenderRESTservice XING Products Deployment Infrastructure
  • 12. 12 Title Company Employment type and career level Full-text description Key properties of a job posting
  • 13. 13 Key sources for understanding user demands Social Network explicit and implicit connections Profile Fabian Abel Data Scientist Haves: Interests: web science big data, hadoop skills & co. Interactions data web social media clicks, bookmarks, ratings, shown big data kununu Interactions of similar users similar usershadoop scala
  • 14. 14 Relevance Estimation Social Network explicit and implicit connections Profile Fabian Abel Data Scientist Haves: Interests: web science big data, hadoop skills & co. Interactions data web social media clicks, bookmarks, ratings big data kununu Interactions of similar users similar usershadoop scala Content- based features Collaborative features Social features Usage behavior features Core RecSys engines (regression model) Logistic Regression P(relevant | x) = 1 1 + e-(b0 + bi xi)i n feature vector impact of feature xi
  • 15. 15 Relevance Estimation + Additional Filters Content- based features Collaborative features Social features Usage behavior features Core RecSys engines (regression model) Location- based filtering Frequenty Shown Filtering Monetary- based diversification Career Level filtering Filtering & Diversification 0.92 0.8 0.76 … 4 core sub-recommender engines and 19 filters that together analyze and exploit around 200 features (relevance criteria) ...
  • 16. 16 Collaborative filtering Theory: User-based and Item-based CF User-Item-Rating Matrix Anna 3 - 4 - 2 Julia 2 - 5 4 1 Tim 4 3 - 5 1 John - 4 5 4 - Java D. SAP Co Data En Data Sc BI Dev User-based CF:  Compare users based on their ratings (e.g. cosine sim.)  Use the n most similar users to predict a rating on an item Item-based CF:  Compare items based on their ratings (e.g. cosine sim.)  Use the n most similar items to predict a rating from a user (simple weight average)
  • 17. 17 Collaborative filtering Reality: Ultra sparse User-Item Matrix and primarily implicit feedback Anna - - 1 - - Julia - - - - - Tim - - - - - John 1 - - - - Java D. SAP Co Data En Data Sc BI Dev High level of sparsity: classical collaborative fitering (or matrix factorization) does not work
  • 18. 18 Collaborative filtering Reality: Ultra sparse User-Item Matrix and primarily implicit feedback Anna - - 1 - - Data Sci - - 32 18 - Tim 524 3 1 - - John - - 2 4 - Java D. SAP Co Data En Data Sc BI Dev Data Scientists Skilled in Java BI Dev Pseudo CF:  Cluster users based on...  jobrole  skills  field of study  Recommend items that simillar users (= clusters) interacted with New item problem remains...
  • 19. 19 Content-based filtering Example: semantic search Fabian Abel Data Mining Expert Haves: Interests: ML, j2ee Hadoop Raw profile Ontology-based Data Scientist Synonyms: Data Mining Expert, Data Mining Specialist, … 6940 263 JEE Synonyms: J2EE, Java Enterprise, … 370 Computer Science Synonyms: Informatik, Comp. Sci., CS, … 162 Hadoop Synonyms: Apache Hadoop, … 473 Machine Learning Synonyms: Maschinelles Lernen, … [jobrole] [skills] [field of studies] Education: Computer Sci. query TFxIDF
  • 20. 20 Content-based filtering Example: more-like-this component Anna Bookmarked, rated and applied-to job postings 1 2 3 q = trans( 1 2 3 ) Recommending similar items q 7 8 9 R = 8 9 7 TFxIDF Re-rank by similarity of topic model vectors: R’ map { r => val x = B’ map { b => cosineSim(r, b) } r -> x.sum / x.size } sortBy(-_._2) 8 7 9 Re-ranking: - LSI - Word2Vec Topic model vector representations 1 2 7 8 3 9 1 2 8 3 97R’= B’= =B =R
  • 21. 21 Content-based filtering Example: more-like-this component CTR TFxIDF LSI-based re-ranking +3.2% +3.1% Word2Vec-based Re-ranking
  • 22. Challenges Issues that we have to fight with… 22
  • 23. 23 Profiles vs. People’s wishes for their future past past Profile describes a user‘s past/current position(s), not future wishes
  • 24. What John writes… 24 And what he means… Recruiter-John International Sales Manager Call Center Agent (10 EUR per hour) Sales Manager Sales Manager for B2B customers (80K EUR per year) Data Scientist skilled in Hadoop, Scala, Elasticsearch, … with PhD in … Data Analyst (skilled in SAS or Excel)
  • 25. What Paul says he is… 25 And what he means… Paul, the Candidate CEO Network Engineer (currently unemployed) BI Engineer (skilled in old-school ETL) Shopman (in a kiosk) Data Scientist with 100+ skills Sales Manager
  • 26. 26 Understanding the meaning of things that recruiters write in job postings and users write in their profiles is not trivial…
  • 27. 27 People freak out if we recommend something wrong!  Try to eliminate freakommendations (outliers)
  • 28. Outlier Filtering Core RecSys engines Location Filter Outlier filterFiltering & Diversification 0.92 0.8 0.76 … Career level Filter ... … ... … 2. Filter: if (r > threshold) keep else drop 1. predictRating( , ) = predict(toFeatureVec( , ) = r //rating between 1 and 5 Estimate how a user would rate the item… (training: 750k explicit ratings)
  • 29. good recos bad recos Percentageoffiltereduser-job postingpairsbyrating threshold 29 Example: with a threshold of 2.5 we kill 86% of the bad and 18% of the good recos Outlier Filtering The “filter onion”: trade-off between killing bad recos and keeping good ones
  • 30. • xgboost-based model • Example features (137 features in total): • Matching & weighting: jobrole, skills, discipline, industry, ... • Distance: home location / job seeker location • Transitions: job role  job role, field of study  job role • ... 30 Outlier Filtering Example features (137 features in total)
  • 31. Outlier Filtering Some A/B test results: user success 31 filtering userswithrecos no filtering -10.9% +7.4% userswhoclickedonrecos no filtering filtering Less people get recommendations, but more users click!  Stricter filtering pays off!
  • 32. ACM RecSys Challenge http::/recsyschallenge.com 32  Task: push recommendations (new items, paid vs. non- paid, premium vs. basic users)  Started beginning of March (ca. 240 teams so far), ends in June  Offline & online evaluation  Still possible to sign-up for the offline evaluation…

Notas del editor

  1. Personalised recommendations based on the user‘s behaviour