SlideShare una empresa de Scribd logo
1 de 18
Benchmarking of a Novel POS
Tagging Based Semantic Similarity
Approach for Job Description
Similarity Computation
Authors: Joydeep Mondal, Sarthak Ahuja, Kushal Mukherjee, Sudhanshu
Shekhar Singh, Gyana Parija
IBM Research Lab, India
Presenter: Joydeep
What exactly it is trying to solve?
Job Description document1 Job Description document2
Semantic Similarity Score
Business Application (Where & Why
it is needed?)
• IBM Watson Recruitment (IWR) : https://www.ibm.com/talent-
management/hr-solutions/recruiting-software
Mapping requisition jobs to the available job
taxonomy without using sate of the art document
similarity methods.
Reason: Almost all state of the art
document similarity methods use
data to train their model. Clients
generally are hesitated to share
their data for training purposes.
How the problem has been solved?
Represent each job
description (Job
Responsibility) into
object-action-
attribute triplet set
by using Stanford
POS Tagger
dependency tree
Calculate semantic
similarity between
each of the
elements of these
categories of triplet
sets of Doc1 with
that of Doc2 and
create a similarity
matrix (symmetric
strongly connected
graph)
Map each category
of one job
responsibility to
those of the other
job responsibility
Calculate the
overall similarity
Example
• Job responsibility1:
• Determines operational feasibility by evaluating analysis, problem definition, requirements, solution
development, and proposed solutions
• Job responsibility2:
• Regulate operational viability. Evaluates survey, requirements and resolution development.
• Representation of Job responsibility1 :
• action: determines, object: feasibility, attributes: [operational ]
• action: evaluating , object: analysis, attributes: []
• action: evaluating , object: problem definition, attributes: []
• action: evaluating , object: requirements, attributes: []
• action: evaluating , object: solution development, attributes: []
• action: evaluating , object: solutions, attributes: [proposed]
• Representation of Job responsibility1 :
• action: regulate , object: viability, attributes: [operational ]
• action: evaluates, object: survey, attributes: []
• action: evaluates, object: requirements, attributes: []
• action: evaluates, object: resolution development, attributes: []
8
VERB NOUN ADJECTIVE
DETERMINE FEASIBIITY OPERATIONAL
VERB NOUN ADJECTIVE
EVALUATING
ANALYSIS
PROBLEM DEFINITION
REQUIREMENTS
SOLUTION DEVELOPMENT
SOLUTIONS PROPOSED
STANFORD POS
TAGGER
DEPENDECY
GRAPH PARSING
Example (Continue…)
• action Similarity:
• action Similarity Matrix :
• action Assignments: (VA) : Hungarian method to solve imbalanced assignment problem
• Regulate -> determine = 0.30
• evaluate-> evaluate = 1.0
• object Similarity:
• object similarity matrices:
Determine evaluate
regulate 0.30 0.25
evaluate 0.31 1.0
feasibility
viability 0.60
9
survey requirements Resolution
development
analysis 0.36 0.30 0.15
Problem
definition
0.16 0.13 0.26
requirements 0.24 1.0 0.15
Solution
development
0.17 0.15 0.65
Solution 0.34 0.31 0.15
Example (Continue…)
• object Assignments: (NA ) Hungarian method to solve imbalanced assignment problem
• feasibility -> viability = 0.60
• analysis -> survey = 0.36
• requirements -> requirements = 1.0
• solution development -> resolution development = 0.65
• attribute Similarity:
• attribute similarity Matrix:
• attribute Assignment: (AA) Hungarian method to solve imbalanced assignment problem
• Operational -> operational ….. (1) = 1.0
• Total similarity score :
• ((VA1 ( 1+ NA1 (1+AA1) /3)+( VA2 (1+ (NA2 + NA3 + NA4 ) /3) )/2)/2 = ((0.30 (1+0.60 * (1+1.0)/3 )) + (1.0
*(1+ (0.36 + 1.0 + 0.65)/3)))/2 =0.5275
Operational
operational 1.0
10Importance order: Action, Object, Attribute
System Diagram
SENTENCE TOKENIZER
d
[s1,s2,s3…]
WORD TOKENIZER + POS TAGGER
SEMANTIC SIMILARITY SCORE CALCULATON
[v1, n1, a1]
[v1, n2, a1]
s1
s2
.
.
Triplet Formation
MULTILEVEL IMBALANCED CLASSICAL
ASSIGNMENT PROBLEM
1
2
3
4
[v2, n3, a2]
T
T T’
uses precomputed
memorized scores
0.767
Similarity Score
PART 1
PART 2
11
Experiment Setup and Evaluation
– F = {F1, F2, ..., Fn} be the set of all job families in the test set.
– Ji = {Ji,1, Ji,2, ..., Ji,ni } be the set of all jobs in family Fi.
– Intrai be the average similarity between all pairs of jobs within Fi
– Interi be the average similarity between all pairs (A, B) of jobs such that A ∈ Fi
And B∈Fj for all j ≠ i
– Ri = Intrai / Interi
S = 𝑖 𝐶𝑎𝑟𝑑𝑖𝑛𝑎𝑙𝑖𝑡𝑦(𝐹𝑖 ∗ 𝑅𝑖 / 𝑖 𝐶𝑎𝑟𝑑𝑖𝑛𝑎𝑙𝑖𝑡𝑦(𝐹𝑖 )
• N1 = 56, N2 = 129 and N3 = 430 documents used for training.
• POSDC does not require any training corpus, therefore the corpus varying experiments are valid
only for doc2vec and LDA.
• test set consisted of 500 randomly chosen jobs out of the 2344 available in IBM Kenexa talent
frameworks
Results
Comparison of S value Across Methods
Results
Core Novelty
• A system to represent a Job Description Document as a collection of
action, object and attribute triplets.
• A method to find similarity score between two such triplets
representations by hierarchically matching of triplets across
documents. It is done as an imbalanced assignment problem to find
the best match (non-greedy, to maximize the overall match score) of
all triplets in the two documents.
Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Job Description Similarity Computation
Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Job Description Similarity Computation

Más contenido relacionado

Similar a Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Job Description Similarity Computation

Analytics Boot Camp - Slides
Analytics Boot Camp - SlidesAnalytics Boot Camp - Slides
Analytics Boot Camp - SlidesAditya Joshi
 
powerpoint
powerpointpowerpoint
powerpointbutest
 
Recommending job ads to people
Recommending job ads to peopleRecommending job ads to people
Recommending job ads to peopleFabian Abel
 
The ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptxThe ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptxRuby Shrestha
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsAlejandro Bellogin
 
Learning with Relative Attributes
Learning with Relative AttributesLearning with Relative Attributes
Learning with Relative AttributesVikas Jain
 
Matching Task Profiles And User Needs In Personalized Web Search
Matching Task Profiles And User Needs In Personalized Web SearchMatching Task Profiles And User Needs In Personalized Web Search
Matching Task Profiles And User Needs In Personalized Web Searchceya
 
ClassifyingIssuesFromSRTextAzureML
ClassifyingIssuesFromSRTextAzureMLClassifyingIssuesFromSRTextAzureML
ClassifyingIssuesFromSRTextAzureMLGeorge Simov
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnBenjamin Bengfort
 
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...IRJET Journal
 
DIRECTIONS READ THE FOLLOWING STUDENT POST AND RESPOND EVALUATE I.docx
DIRECTIONS READ THE FOLLOWING STUDENT POST AND RESPOND EVALUATE I.docxDIRECTIONS READ THE FOLLOWING STUDENT POST AND RESPOND EVALUATE I.docx
DIRECTIONS READ THE FOLLOWING STUDENT POST AND RESPOND EVALUATE I.docxlynettearnold46882
 
Part 1
Part 1Part 1
Part 1butest
 
316_16SCCCS4_2020052505222431.pptdatabasex
316_16SCCCS4_2020052505222431.pptdatabasex316_16SCCCS4_2020052505222431.pptdatabasex
316_16SCCCS4_2020052505222431.pptdatabasexabhaysonone0
 
Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee AttritionShruti Mohan
 
Learning To Rank User Queries to Detect Search Tasks
Learning To Rank User Queries to Detect Search TasksLearning To Rank User Queries to Detect Search Tasks
Learning To Rank User Queries to Detect Search TasksFranco Maria Nardini
 
Empirical Study on Collaborative Software in the field of Machine learning.pptx
Empirical Study on Collaborative Software in the field of Machine learning.pptxEmpirical Study on Collaborative Software in the field of Machine learning.pptx
Empirical Study on Collaborative Software in the field of Machine learning.pptxghufranullah25
 

Similar a Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Job Description Similarity Computation (20)

Analytics Boot Camp - Slides
Analytics Boot Camp - SlidesAnalytics Boot Camp - Slides
Analytics Boot Camp - Slides
 
powerpoint
powerpointpowerpoint
powerpoint
 
Recommending job ads to people
Recommending job ads to peopleRecommending job ads to people
Recommending job ads to people
 
The ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptxThe ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptx
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender Systems
 
Learning with Relative Attributes
Learning with Relative AttributesLearning with Relative Attributes
Learning with Relative Attributes
 
Learning how to learn
Learning how to learnLearning how to learn
Learning how to learn
 
Matching Task Profiles And User Needs In Personalized Web Search
Matching Task Profiles And User Needs In Personalized Web SearchMatching Task Profiles And User Needs In Personalized Web Search
Matching Task Profiles And User Needs In Personalized Web Search
 
Nl201609
Nl201609Nl201609
Nl201609
 
Job evaluation
Job evaluationJob evaluation
Job evaluation
 
ClassifyingIssuesFromSRTextAzureML
ClassifyingIssuesFromSRTextAzureMLClassifyingIssuesFromSRTextAzureML
ClassifyingIssuesFromSRTextAzureML
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
assia2015sakai
assia2015sakaiassia2015sakai
assia2015sakai
 
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
 
DIRECTIONS READ THE FOLLOWING STUDENT POST AND RESPOND EVALUATE I.docx
DIRECTIONS READ THE FOLLOWING STUDENT POST AND RESPOND EVALUATE I.docxDIRECTIONS READ THE FOLLOWING STUDENT POST AND RESPOND EVALUATE I.docx
DIRECTIONS READ THE FOLLOWING STUDENT POST AND RESPOND EVALUATE I.docx
 
Part 1
Part 1Part 1
Part 1
 
316_16SCCCS4_2020052505222431.pptdatabasex
316_16SCCCS4_2020052505222431.pptdatabasex316_16SCCCS4_2020052505222431.pptdatabasex
316_16SCCCS4_2020052505222431.pptdatabasex
 
Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee Attrition
 
Learning To Rank User Queries to Detect Search Tasks
Learning To Rank User Queries to Detect Search TasksLearning To Rank User Queries to Detect Search Tasks
Learning To Rank User Queries to Detect Search Tasks
 
Empirical Study on Collaborative Software in the field of Machine learning.pptx
Empirical Study on Collaborative Software in the field of Machine learning.pptxEmpirical Study on Collaborative Software in the field of Machine learning.pptx
Empirical Study on Collaborative Software in the field of Machine learning.pptx
 

Último

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 

Último (20)

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 

Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Job Description Similarity Computation

  • 1. Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Job Description Similarity Computation Authors: Joydeep Mondal, Sarthak Ahuja, Kushal Mukherjee, Sudhanshu Shekhar Singh, Gyana Parija IBM Research Lab, India Presenter: Joydeep
  • 2. What exactly it is trying to solve?
  • 3. Job Description document1 Job Description document2 Semantic Similarity Score
  • 4. Business Application (Where & Why it is needed?)
  • 5. • IBM Watson Recruitment (IWR) : https://www.ibm.com/talent- management/hr-solutions/recruiting-software Mapping requisition jobs to the available job taxonomy without using sate of the art document similarity methods. Reason: Almost all state of the art document similarity methods use data to train their model. Clients generally are hesitated to share their data for training purposes.
  • 6. How the problem has been solved?
  • 7. Represent each job description (Job Responsibility) into object-action- attribute triplet set by using Stanford POS Tagger dependency tree Calculate semantic similarity between each of the elements of these categories of triplet sets of Doc1 with that of Doc2 and create a similarity matrix (symmetric strongly connected graph) Map each category of one job responsibility to those of the other job responsibility Calculate the overall similarity
  • 8. Example • Job responsibility1: • Determines operational feasibility by evaluating analysis, problem definition, requirements, solution development, and proposed solutions • Job responsibility2: • Regulate operational viability. Evaluates survey, requirements and resolution development. • Representation of Job responsibility1 : • action: determines, object: feasibility, attributes: [operational ] • action: evaluating , object: analysis, attributes: [] • action: evaluating , object: problem definition, attributes: [] • action: evaluating , object: requirements, attributes: [] • action: evaluating , object: solution development, attributes: [] • action: evaluating , object: solutions, attributes: [proposed] • Representation of Job responsibility1 : • action: regulate , object: viability, attributes: [operational ] • action: evaluates, object: survey, attributes: [] • action: evaluates, object: requirements, attributes: [] • action: evaluates, object: resolution development, attributes: [] 8 VERB NOUN ADJECTIVE DETERMINE FEASIBIITY OPERATIONAL VERB NOUN ADJECTIVE EVALUATING ANALYSIS PROBLEM DEFINITION REQUIREMENTS SOLUTION DEVELOPMENT SOLUTIONS PROPOSED STANFORD POS TAGGER DEPENDECY GRAPH PARSING
  • 9. Example (Continue…) • action Similarity: • action Similarity Matrix : • action Assignments: (VA) : Hungarian method to solve imbalanced assignment problem • Regulate -> determine = 0.30 • evaluate-> evaluate = 1.0 • object Similarity: • object similarity matrices: Determine evaluate regulate 0.30 0.25 evaluate 0.31 1.0 feasibility viability 0.60 9 survey requirements Resolution development analysis 0.36 0.30 0.15 Problem definition 0.16 0.13 0.26 requirements 0.24 1.0 0.15 Solution development 0.17 0.15 0.65 Solution 0.34 0.31 0.15
  • 10. Example (Continue…) • object Assignments: (NA ) Hungarian method to solve imbalanced assignment problem • feasibility -> viability = 0.60 • analysis -> survey = 0.36 • requirements -> requirements = 1.0 • solution development -> resolution development = 0.65 • attribute Similarity: • attribute similarity Matrix: • attribute Assignment: (AA) Hungarian method to solve imbalanced assignment problem • Operational -> operational ….. (1) = 1.0 • Total similarity score : • ((VA1 ( 1+ NA1 (1+AA1) /3)+( VA2 (1+ (NA2 + NA3 + NA4 ) /3) )/2)/2 = ((0.30 (1+0.60 * (1+1.0)/3 )) + (1.0 *(1+ (0.36 + 1.0 + 0.65)/3)))/2 =0.5275 Operational operational 1.0 10Importance order: Action, Object, Attribute
  • 11. System Diagram SENTENCE TOKENIZER d [s1,s2,s3…] WORD TOKENIZER + POS TAGGER SEMANTIC SIMILARITY SCORE CALCULATON [v1, n1, a1] [v1, n2, a1] s1 s2 . . Triplet Formation MULTILEVEL IMBALANCED CLASSICAL ASSIGNMENT PROBLEM 1 2 3 4 [v2, n3, a2] T T T’ uses precomputed memorized scores 0.767 Similarity Score PART 1 PART 2 11
  • 12.
  • 13. Experiment Setup and Evaluation – F = {F1, F2, ..., Fn} be the set of all job families in the test set. – Ji = {Ji,1, Ji,2, ..., Ji,ni } be the set of all jobs in family Fi. – Intrai be the average similarity between all pairs of jobs within Fi – Interi be the average similarity between all pairs (A, B) of jobs such that A ∈ Fi And B∈Fj for all j ≠ i – Ri = Intrai / Interi S = 𝑖 𝐶𝑎𝑟𝑑𝑖𝑛𝑎𝑙𝑖𝑡𝑦(𝐹𝑖 ∗ 𝑅𝑖 / 𝑖 𝐶𝑎𝑟𝑑𝑖𝑛𝑎𝑙𝑖𝑡𝑦(𝐹𝑖 ) • N1 = 56, N2 = 129 and N3 = 430 documents used for training. • POSDC does not require any training corpus, therefore the corpus varying experiments are valid only for doc2vec and LDA. • test set consisted of 500 randomly chosen jobs out of the 2344 available in IBM Kenexa talent frameworks
  • 14. Results Comparison of S value Across Methods
  • 16. Core Novelty • A system to represent a Job Description Document as a collection of action, object and attribute triplets. • A method to find similarity score between two such triplets representations by hierarchically matching of triplets across documents. It is done as an imbalanced assignment problem to find the best match (non-greedy, to maximize the overall match score) of all triplets in the two documents.