The document discusses machine learning techniques for recommender systems in job markets. It describes the goals of recommending jobs that users are interested in and qualified for. It outlines challenges like understanding user profiles and job descriptions accurately. It then explains the recommender system architecture, which uses collaborative, content-based, and other filters to make recommendations by analyzing over 200 features of users and jobs. Evaluation results show that outlier filtering improves the quality of recommendations by reducing bad recommendations while keeping good ones.
2. Challenge
Given a user, the goal is to recommend job postings…
1. that the user may be interested in and
2. for which the user is an appropriate candidate.
2
Scala Dev
(m/w)
Scala
Engineer
Scala Dev,
Hamburg
user
job postings
Job
recommende
r
companie
s
recruiter
19M
750k-1M
3. 3
Goals / Triangle of contradiction
Scala Dev,
Hamburg
• Relevant recos
• No spam
• Relevant
candidates
• High reach
• Happy customers
• High revenue (e.g. many
clicks on paid content)
companie
s
user
13. 13
Key sources for understanding user demands
Social Network
explicit and
implicit
connections
Profile
Fabian Abel
Data Scientist
Haves:
Interests:
web science
big data, hadoop
skills & co.
Interactions
data
web
social media
clicks, bookmarks,
ratings, shown
big data
kununu
Interactions of
similar users
similar usershadoop
scala
14. 14
Relevance Estimation
Social Network
explicit and
implicit
connections
Profile
Fabian Abel
Data Scientist
Haves:
Interests:
web science
big data, hadoop
skills & co.
Interactions
data
web
social media
clicks, bookmarks,
ratings
big data
kununu
Interactions of
similar users
similar usershadoop
scala
Content-
based
features
Collaborative
features
Social
features
Usage
behavior
features
Core
RecSys
engines
(regression model)
Logistic Regression
P(relevant | x) =
1
1 + e-(b0 + bi xi)i
n
feature vector impact of feature xi
15. 15
Relevance Estimation + Additional Filters
Content-
based
features
Collaborative
features
Social
features
Usage
behavior
features
Core
RecSys
engines
(regression model)
Location-
based
filtering
Frequenty
Shown
Filtering
Monetary-
based
diversification
Career Level
filtering
Filtering &
Diversification
0.92 0.8 0.76
…
4 core sub-recommender
engines and 19 filters that
together analyze and exploit
around 200 features
(relevance criteria)
...
16. 16
Collaborative filtering
Theory: User-based and Item-based CF
User-Item-Rating Matrix
Anna
3 - 4 - 2
Julia
2 - 5 4 1
Tim
4 3 - 5 1
John
- 4 5 4 -
Java D. SAP Co Data En Data Sc BI Dev
User-based CF:
Compare users based on their
ratings (e.g. cosine sim.)
Use the n most similar users to
predict a rating on an item
Item-based CF:
Compare items based on their
ratings (e.g. cosine sim.)
Use the n most similar items to
predict a rating from a user
(simple weight average)
17. 17
Collaborative filtering
Reality: Ultra sparse User-Item Matrix and primarily implicit feedback
Anna
- - 1 - -
Julia
- - - - -
Tim
- - - - -
John
1 - - - -
Java D. SAP Co Data En Data Sc BI Dev
High level of sparsity:
classical collaborative
fitering (or matrix
factorization) does not
work
18. 18
Collaborative filtering
Reality: Ultra sparse User-Item Matrix and primarily implicit feedback
Anna
- - 1 - -
Data
Sci
- - 32 18 -
Tim
524 3 1 - -
John
- - 2 4 -
Java D. SAP Co Data En Data Sc BI Dev
Data
Scientists
Skilled
in Java
BI Dev
Pseudo CF:
Cluster users based on...
jobrole
skills
field of study
Recommend items that simillar
users (= clusters) interacted with
New item problem remains...
19. 19
Content-based filtering
Example: semantic search
Fabian Abel
Data Mining Expert
Haves:
Interests:
ML, j2ee
Hadoop
Raw profile Ontology-based
Data Scientist
Synonyms: Data Mining Expert, Data
Mining Specialist, …
6940
263
JEE
Synonyms:
J2EE, Java
Enterprise, …
370
Computer Science
Synonyms: Informatik, Comp.
Sci., CS, …
162
Hadoop
Synonyms:
Apache
Hadoop, …
473
Machine
Learning
Synonyms:
Maschinelles Lernen,
…
[jobrole]
[skills]
[field of studies]
Education: Computer Sci.
query
TFxIDF
20. 20
Content-based filtering
Example: more-like-this component
Anna
Bookmarked, rated
and applied-to job
postings
1 2 3
q = trans( 1 2 3 )
Recommending
similar items
q
7 8 9
R =
8
9
7
TFxIDF
Re-rank by similarity of
topic model vectors:
R’ map { r =>
val x = B’ map { b =>
cosineSim(r, b)
}
r -> x.sum / x.size
} sortBy(-_._2)
8
7
9 Re-ranking:
- LSI
- Word2Vec
Topic model
vector
representations
1 2
7 8
3
9
1 2
8
3
97R’=
B’=
=B
=R
23. 23
Profiles vs. People’s wishes for their
future
past
past
Profile describes a
user‘s past/current
position(s), not future
wishes
24. What John writes…
24
And what he means…
Recruiter-John
International Sales Manager Call Center Agent
(10 EUR per hour)
Sales Manager Sales Manager for B2B
customers
(80K EUR per year)
Data Scientist skilled in Hadoop,
Scala, Elasticsearch, … with PhD in …
Data Analyst
(skilled in SAS or Excel)
25. What Paul says he is…
25
And what he means…
Paul, the Candidate
CEO Network Engineer
(currently unemployed)
BI Engineer
(skilled in old-school ETL)
Shopman
(in a kiosk)
Data Scientist with 100+ skills
Sales Manager
26. 26
Understanding the meaning of things that recruiters
write in job postings and users write in their profiles is
not trivial…
27. 27
People freak out if we
recommend
something wrong!
Try to eliminate
freakommendations
(outliers)
28. Outlier Filtering
Core
RecSys
engines
Location Filter
Outlier filterFiltering &
Diversification
0.92 0.8 0.76
…
Career level
Filter
...
…
...
…
2. Filter:
if (r > threshold) keep
else drop
1. predictRating( , )
= predict(toFeatureVec( , )
= r //rating between 1 and 5
Estimate how a user
would rate the
item…
(training: 750k
explicit ratings)
30. • xgboost-based model
• Example features (137 features in total):
• Matching & weighting: jobrole, skills, discipline, industry, ...
• Distance: home location / job seeker location
• Transitions: job role job role, field of study job role
• ...
30
Outlier Filtering
Example features (137 features in total)
31. Outlier Filtering
Some A/B test results: user success
31
filtering
userswithrecos
no filtering
-10.9%
+7.4%
userswhoclickedonrecos no filtering filtering
Less people get recommendations,
but more users click!
Stricter filtering pays off!
32. ACM RecSys Challenge
http::/recsyschallenge.com
32
Task: push recommendations (new items, paid vs. non-
paid, premium vs. basic users)
Started beginning of March (ca. 240 teams so far), ends in
June
Offline & online evaluation
Still possible to sign-up for the offline evaluation…