Injustice - Developers Among Us (SciFiDevCon 2024)
Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Job Description Similarity Computation
1. Benchmarking of a Novel POS
Tagging Based Semantic Similarity
Approach for Job Description
Similarity Computation
Authors: Joydeep Mondal, Sarthak Ahuja, Kushal Mukherjee, Sudhanshu
Shekhar Singh, Gyana Parija
IBM Research Lab, India
Presenter: Joydeep
5. • IBM Watson Recruitment (IWR) : https://www.ibm.com/talent-
management/hr-solutions/recruiting-software
Mapping requisition jobs to the available job
taxonomy without using sate of the art document
similarity methods.
Reason: Almost all state of the art
document similarity methods use
data to train their model. Clients
generally are hesitated to share
their data for training purposes.
7. Represent each job
description (Job
Responsibility) into
object-action-
attribute triplet set
by using Stanford
POS Tagger
dependency tree
Calculate semantic
similarity between
each of the
elements of these
categories of triplet
sets of Doc1 with
that of Doc2 and
create a similarity
matrix (symmetric
strongly connected
graph)
Map each category
of one job
responsibility to
those of the other
job responsibility
Calculate the
overall similarity
11. System Diagram
SENTENCE TOKENIZER
d
[s1,s2,s3…]
WORD TOKENIZER + POS TAGGER
SEMANTIC SIMILARITY SCORE CALCULATON
[v1, n1, a1]
[v1, n2, a1]
s1
s2
.
.
Triplet Formation
MULTILEVEL IMBALANCED CLASSICAL
ASSIGNMENT PROBLEM
1
2
3
4
[v2, n3, a2]
T
T T’
uses precomputed
memorized scores
0.767
Similarity Score
PART 1
PART 2
11
12.
13. Experiment Setup and Evaluation
– F = {F1, F2, ..., Fn} be the set of all job families in the test set.
– Ji = {Ji,1, Ji,2, ..., Ji,ni } be the set of all jobs in family Fi.
– Intrai be the average similarity between all pairs of jobs within Fi
– Interi be the average similarity between all pairs (A, B) of jobs such that A ∈ Fi
And B∈Fj for all j ≠ i
– Ri = Intrai / Interi
S = 𝑖 𝐶𝑎𝑟𝑑𝑖𝑛𝑎𝑙𝑖𝑡𝑦(𝐹𝑖 ∗ 𝑅𝑖 / 𝑖 𝐶𝑎𝑟𝑑𝑖𝑛𝑎𝑙𝑖𝑡𝑦(𝐹𝑖 )
• N1 = 56, N2 = 129 and N3 = 430 documents used for training.
• POSDC does not require any training corpus, therefore the corpus varying experiments are valid
only for doc2vec and LDA.
• test set consisted of 500 randomly chosen jobs out of the 2344 available in IBM Kenexa talent
frameworks
16. Core Novelty
• A system to represent a Job Description Document as a collection of
action, object and attribute triplets.
• A method to find similarity score between two such triplets
representations by hierarchically matching of triplets across
documents. It is done as an imbalanced assignment problem to find
the best match (non-greedy, to maximize the overall match score) of
all triplets in the two documents.