Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Job Description Similarity Computation

Benchmarking of a Novel POS
Tagging Based Semantic Similarity
Approach for Job Description
Similarity Computation
Authors: Joydeep Mondal, Sarthak Ahuja, Kushal Mukherjee, Sudhanshu
Shekhar Singh, Gyana Parija
IBM Research Lab, India
Presenter: Joydeep

What exactly it is trying to solve?

Job Description document1 Job Description document2
Semantic Similarity Score

Business Application (Where & Why
it is needed?)

• IBM Watson Recruitment (IWR) : https://www.ibm.com/talent-
management/hr-solutions/recruiting-software
Mapping requisition jobs to the available job
taxonomy without using sate of the art document
similarity methods.
Reason: Almost all state of the art
document similarity methods use
data to train their model. Clients
generally are hesitated to share
their data for training purposes.

How the problem has been solved?

Represent each job
description (Job
Responsibility) into
object-action-
attribute triplet set
by using Stanford
POS Tagger
dependency tree
Calculate semantic
similarity between
each of the
elements of these
categories of triplet
sets of Doc1 with
that of Doc2 and
create a similarity
matrix (symmetric
strongly connected
graph)
Map each category
of one job
responsibility to
those of the other
job responsibility
Calculate the
overall similarity

Example
• Job responsibility1:
• Determines operational feasibility by evaluating analysis, problem definition, requirements, solution
development, and proposed solutions
• Job responsibility2:
• Regulate operational viability. Evaluates survey, requirements and resolution development.
• Representation of Job responsibility1 :
• action: determines, object: feasibility, attributes: [operational ]
• action: evaluating , object: analysis, attributes: []
• action: evaluating , object: problem definition, attributes: []
• action: evaluating , object: requirements, attributes: []
• action: evaluating , object: solution development, attributes: []
• action: evaluating , object: solutions, attributes: [proposed]
• Representation of Job responsibility1 :
• action: regulate , object: viability, attributes: [operational ]
• action: evaluates, object: survey, attributes: []
• action: evaluates, object: requirements, attributes: []
• action: evaluates, object: resolution development, attributes: []
8
VERB NOUN ADJECTIVE
DETERMINE FEASIBIITY OPERATIONAL
VERB NOUN ADJECTIVE
EVALUATING
ANALYSIS
PROBLEM DEFINITION
REQUIREMENTS
SOLUTION DEVELOPMENT
SOLUTIONS PROPOSED
STANFORD POS
TAGGER
DEPENDECY
GRAPH PARSING

Example (Continue…)
• action Similarity:
• action Similarity Matrix :
• action Assignments: (VA) : Hungarian method to solve imbalanced assignment problem
• Regulate -> determine = 0.30
• evaluate-> evaluate = 1.0
• object Similarity:
• object similarity matrices:
Determine evaluate
regulate 0.30 0.25
evaluate 0.31 1.0
feasibility
viability 0.60
9
survey requirements Resolution
development
analysis 0.36 0.30 0.15
Problem
definition
0.16 0.13 0.26
requirements 0.24 1.0 0.15
Solution
development
0.17 0.15 0.65
Solution 0.34 0.31 0.15

Example (Continue…)
• object Assignments: (NA ) Hungarian method to solve imbalanced assignment problem
• feasibility -> viability = 0.60
• analysis -> survey = 0.36
• requirements -> requirements = 1.0
• solution development -> resolution development = 0.65
• attribute Similarity:
• attribute similarity Matrix:
• attribute Assignment: (AA) Hungarian method to solve imbalanced assignment problem
• Operational -> operational ….. (1) = 1.0
• Total similarity score :
• ((VA1 ( 1+ NA1 (1+AA1) /3)+( VA2 (1+ (NA2 + NA3 + NA4 ) /3) )/2)/2 = ((0.30 (1+0.60 * (1+1.0)/3 )) + (1.0
*(1+ (0.36 + 1.0 + 0.65)/3)))/2 =0.5275
Operational
operational 1.0
10Importance order: Action, Object, Attribute

System Diagram
SENTENCE TOKENIZER
d
[s1,s2,s3…]
WORD TOKENIZER + POS TAGGER
SEMANTIC SIMILARITY SCORE CALCULATON
[v1, n1, a1]
[v1, n2, a1]
s1
s2
.
.
Triplet Formation
MULTILEVEL IMBALANCED CLASSICAL
ASSIGNMENT PROBLEM
1
2
3
4
[v2, n3, a2]
T
T T’
uses precomputed
memorized scores
0.767
Similarity Score
PART 1
PART 2
11

Experiment Setup and Evaluation
– F = {F1, F2, ..., Fn} be the set of all job families in the test set.
– Ji = {Ji,1, Ji,2, ..., Ji,ni } be the set of all jobs in family Fi.
– Intrai be the average similarity between all pairs of jobs within Fi
– Interi be the average similarity between all pairs (A, B) of jobs such that A ∈ Fi
And B∈Fj for all j ≠ i
– Ri = Intrai / Interi
S = 𝑖 𝐶𝑎𝑟𝑑𝑖𝑛𝑎𝑙𝑖𝑡𝑦(𝐹𝑖 ∗ 𝑅𝑖 / 𝑖 𝐶𝑎𝑟𝑑𝑖𝑛𝑎𝑙𝑖𝑡𝑦(𝐹𝑖 )
• N1 = 56, N2 = 129 and N3 = 430 documents used for training.
• POSDC does not require any training corpus, therefore the corpus varying experiments are valid
only for doc2vec and LDA.
• test set consisted of 500 randomly chosen jobs out of the 2344 available in IBM Kenexa talent
frameworks

Results
Comparison of S value Across Methods

Core Novelty
• A system to represent a Job Description Document as a collection of
action, object and attribute triplets.
• A method to find similarity score between two such triplets
representations by hierarchically matching of triplets across
documents. It is done as an imbalanced assignment problem to find
the best match (non-greedy, to maximize the overall match score) of
all triplets in the two documents.

Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Job Description Similarity Computation

Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Job Description Similarity Computation

Recomendados

Recomendados

Más contenido relacionado

Similar a Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Job Description Similarity Computation

Similar a Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Job Description Similarity Computation (20)

Último

Último (20)

Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Job Description Similarity Computation