SlideShare una empresa de Scribd logo
1 de 23
Descargar para leer sin conexión
A Real-time Heuristic based Name 
Disambiguation Method for Digital Libraries 
Muhammad Imran, Syed Zeeshan Haider Gillani, Maurizio Marchese
Outline 
• Name Disambiguation problem 
• Mixed and Split Citations 
• Related work 
• Our approach 
• Experiments & results 
• Conclusion
Name Disambiguation 
Author-1 Author-2 Author-3 Author-4 
Muhammad Imran 
Multiple authors 
share same name 
Name variation-1 Name variation-2 Name variation-3 
Muhammad Imran M. Imran Imran Muhammad 
One author 
with multiple 
name variations
Name Disambiguation Types 
M. Imran 
Muhammad Imran Malik Imran Mehar Imran 
Mixed citations 
mixed citation 
records 
DL
Name Disambiguation Types 
Muhammad Imran 
Author-1 Author-2 Author-3 
Split citations 
split 
citations 
DL 
split 
citations 
split 
citations
Related Work 
• Supervision approaches 
• Generative (naïve Bayes) 
• Discriminative (Support vector machines) 
• Labor-intensive, high training cost 
• Unsupervised approaches 
• Mostly failed to tackle name variations issue 
• No users interventions
Our Contributions 
• An end-to-end system 
• Retrieval -> pre-processing -> disambiguation 
• A generic disambiguation approach 
• Unsupervised 
• Heuristics based 
• Involves Users’ feedback
Our Approach 
Citation 
Records 
a cluster 
CR 
CR 
C 
R 
Cluster 
selection 
CR 
CR 
C 
R 
C 
R 
C 
R 
cp 
cp 
cp 
cp 
C 
R 
C 
R 
C 
R 
cp 
cp 
cp 
Citation records 
containing both mixed 
and split 
subset of citation records 
Discipline based clustering 
Co-author based split & building 
candidate principal authors' list 
Affiliation & candidate authors 
based merge 
C 
R 
C 
R 
c 
p 
c 
p 
Title & homepage based merge 
Principal 
cluster 
selection 
user selected 
CR 
pa 
user selected 
principal cluster 
CR 
p 
a 
title based vector 
titl 
e 
titl 
e 
list of candidate principal authors 
principal author 
Layer-3 Layer-4 Layer-2 Layer-1
Hierarchical Clustering & Feature 
Representation 
• Approaches 
• Agglomerative 
Feature matrix (N x D) 
• Divisive 
Xi,j 
N (cols) = No. of citation records 
D (rows) = No. of features 
jth feature of ith citation record
Features: co-authorship 
• Joint authors of a book, article … 
• Available across DLs 
• We use it as: 
• Principal author 
• Co-authors 
citation 
record 
{author-1, author-2, author-3, author-4, author-5} 
principal author co-authors
Features: co-authorship 
• Heuristics 
“If a co-author appears in two different publications with a same 
principal author then most likely both publications belong to the 
principal author” 
citation 
record-1 
{author-1, author-2, ...} 
author-2 
THEN 
principal author-1 
citation 
record-2 
{author-1, author-2, ...} 
author-2 
IF = 
= 
principal author-1
Features: Conference Venue 
• Venue represents an event name e.g., a 
conference, workshop or a journal name. 
• Available across DLs. 
• Heuristics 
“The venues information of two researchers, having same names, 
can differentiate one from the other based on examining disciplines 
and sub-disciplines information of a researcher's interest.”
Features: Author’s Affiliation 
• Author’s affiliation with an institute, university, 
organization etc. 
• Available across DLs. 
• Heuristics 
“If two publications with same principal author names, also share 
the same affiliation information then both publications will be 
considered as belongs to the same author.”
Features: Authors Names 
• An author’s name can have multiple name 
variations. 
• For example: Muhammad Imran 
• M. Imran 
• Imran Muhammad 
• Muhammad. I
Features: Publications titles 
• Title as a String literal 
• We maintain a vector of important keywords 
• Represents author’s interests 
• Similarity measure between a given citation 
records and the vector can be useful
Features: Principal Author’s Homepage 
• Homepage is the URL of an author's 
homepage.
Disambiguation System in Action 
• Inter-related disciplines based formation of 
clusters 
• Co-authors based split 
• Affiliation based agglomerative 
• Pursuit of the remaining bits
Inter-related disciplines based formation 
of clusters 
• Exploits venue/discipline information 
• Forms relatively big clusters 
• Involves users and consider their selection among 
clusters
Inter-related disciplines based formation 
of clusters 
• Inter-related disciplines based formation of 
clusters
Co-author Based Split 
• Using k-means clustering
Experiment & Evaluation 
Dataset 
• 50 most ambiguous researchers 
• Manually annotated a golden dataset 
• Used DBLP as a data source 
• Used ADANA as a base-line approach 
• Used Precision, Recall and F1 as performance 
measures
Experiment & Evaluation
Thank you! 
Muhammad Imran 
mimran@qf.org.qa

Más contenido relacionado

Similar a A Real-time Heuristic-based Unsupervised Method for Name Disambiguation in Digital Libraries

Recommender systems
Recommender systemsRecommender systems
Recommender systemsTamer Rezk
 
Haystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesHaystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesOpenSource Connections
 
Searching with vectors
Searching with vectorsSearching with vectors
Searching with vectorsSimon Hughes
 
Author paper identification problem
Author  paper identification problemAuthor  paper identification problem
Author paper identification problemPooja Mishra
 
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com Lucidworks
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingSimon Hughes
 
OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialSteven Francia
 
Who Cited Me? An Introduction to Citation Searching and Analysis
Who Cited Me? An Introduction to Citation Searching and AnalysisWho Cited Me? An Introduction to Citation Searching and Analysis
Who Cited Me? An Introduction to Citation Searching and Analysisrgkwml
 
Tag based recommender system
Tag based recommender systemTag based recommender system
Tag based recommender systemKaren Li
 
Schema Design
Schema DesignSchema Design
Schema DesignMongoDB
 
Crossref Metadata and Metadata Services
Crossref Metadata and Metadata ServicesCrossref Metadata and Metadata Services
Crossref Metadata and Metadata ServicesCrossref
 
CrossRef DOIs for Books
CrossRef DOIs for BooksCrossRef DOIs for Books
CrossRef DOIs for BooksCrossref
 

Similar a A Real-time Heuristic-based Unsupervised Method for Name Disambiguation in Digital Libraries (20)

Recommender systems
Recommender systemsRecommender systems
Recommender systems
 
Recommenders.ppt
Recommenders.pptRecommenders.ppt
Recommenders.ppt
 
Recommenders.ppt
Recommenders.pptRecommenders.ppt
Recommenders.ppt
 
Haystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesHaystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon Hughes
 
Searching with vectors
Searching with vectorsSearching with vectors
Searching with vectors
 
Author paper identification problem
Author  paper identification problemAuthor  paper identification problem
Author paper identification problem
 
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic Matching
 
Filtering content bbased crs
Filtering content bbased crsFiltering content bbased crs
Filtering content bbased crs
 
OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB Tutorial
 
HypergraphDB
HypergraphDBHypergraphDB
HypergraphDB
 
Semantic web
Semantic webSemantic web
Semantic web
 
DOIs for Book Publishers
DOIs for Book PublishersDOIs for Book Publishers
DOIs for Book Publishers
 
Who Cited Me? An Introduction to Citation Searching and Analysis
Who Cited Me? An Introduction to Citation Searching and AnalysisWho Cited Me? An Introduction to Citation Searching and Analysis
Who Cited Me? An Introduction to Citation Searching and Analysis
 
Tag based recommender system
Tag based recommender systemTag based recommender system
Tag based recommender system
 
DC presentation 1
DC presentation 1DC presentation 1
DC presentation 1
 
Schema Design
Schema DesignSchema Design
Schema Design
 
Crossref Metadata and Metadata Services
Crossref Metadata and Metadata ServicesCrossref Metadata and Metadata Services
Crossref Metadata and Metadata Services
 
MongoDB for Genealogy
MongoDB for GenealogyMongoDB for Genealogy
MongoDB for Genealogy
 
CrossRef DOIs for Books
CrossRef DOIs for BooksCrossRef DOIs for Books
CrossRef DOIs for Books
 

Más de Muhammad Imran

Processing Social Media Messages in Mass Emergency: A Survey
Processing Social Media Messages in Mass Emergency: A SurveyProcessing Social Media Messages in Mass Emergency: A Survey
Processing Social Media Messages in Mass Emergency: A SurveyMuhammad Imran
 
Damage Assessment from Social Media Imagery Data During Disasters
Damage Assessment from Social Media Imagery Data During DisastersDamage Assessment from Social Media Imagery Data During Disasters
Damage Assessment from Social Media Imagery Data During DisastersMuhammad Imran
 
Image4Act: Online Social Media Image Processing for Disaster Response
Image4Act: Online Social Media Image Processing for Disaster ResponseImage4Act: Online Social Media Image Processing for Disaster Response
Image4Act: Online Social Media Image Processing for Disaster ResponseMuhammad Imran
 
Real-Time Processing of Social Media Content for Social Good
Real-Time Processing of Social Media Content for Social GoodReal-Time Processing of Social Media Content for Social Good
Real-Time Processing of Social Media Content for Social GoodMuhammad Imran
 
AIDR Tutorial (Artificial Intelligence for Disaster Response)
AIDR Tutorial (Artificial Intelligence for Disaster Response)AIDR Tutorial (Artificial Intelligence for Disaster Response)
AIDR Tutorial (Artificial Intelligence for Disaster Response)Muhammad Imran
 
A Robust Framework for Classifying Evolving Document Streams in an Expert-Mac...
A Robust Framework for Classifying Evolving Document Streams in an Expert-Mac...A Robust Framework for Classifying Evolving Document Streams in an Expert-Mac...
A Robust Framework for Classifying Evolving Document Streams in an Expert-Mac...Muhammad Imran
 
Summarizing Situational Tweets in Crisis Scenario
Summarizing Situational Tweets in Crisis ScenarioSummarizing Situational Tweets in Crisis Scenario
Summarizing Situational Tweets in Crisis ScenarioMuhammad Imran
 
The Role of Social Media and Artificial Intelligence for Disaster Response
The Role of Social Media and Artificial Intelligence for Disaster ResponseThe Role of Social Media and Artificial Intelligence for Disaster Response
The Role of Social Media and Artificial Intelligence for Disaster ResponseMuhammad Imran
 
Introduction to Machine Learning: An Application to Disaster Response
Introduction to Machine Learning: An Application to Disaster ResponseIntroduction to Machine Learning: An Application to Disaster Response
Introduction to Machine Learning: An Application to Disaster ResponseMuhammad Imran
 
Artificial Intelligence for Disaster Response
Artificial Intelligence for Disaster ResponseArtificial Intelligence for Disaster Response
Artificial Intelligence for Disaster ResponseMuhammad Imran
 
Coordinating Human and Machine Intelligence to Classify Microblog Communica0o...
Coordinating Human and Machine Intelligence to Classify Microblog Communica0o...Coordinating Human and Machine Intelligence to Classify Microblog Communica0o...
Coordinating Human and Machine Intelligence to Classify Microblog Communica0o...Muhammad Imran
 
Tweet4act: Using Incident-Specific Profiles for Classifying Crisis-Related Me...
Tweet4act: Using Incident-Specific Profiles for Classifying Crisis-Related Me...Tweet4act: Using Incident-Specific Profiles for Classifying Crisis-Related Me...
Tweet4act: Using Incident-Specific Profiles for Classifying Crisis-Related Me...Muhammad Imran
 
Extracting Information Nuggets from Disaster-Related Messages in Social Media
Extracting Information Nuggets from Disaster-Related Messages in Social MediaExtracting Information Nuggets from Disaster-Related Messages in Social Media
Extracting Information Nuggets from Disaster-Related Messages in Social MediaMuhammad Imran
 
Domain Specific Mashups
Domain Specific MashupsDomain Specific Mashups
Domain Specific MashupsMuhammad Imran
 
Reseval Mashup Platform Talk at SECO
Reseval Mashup Platform Talk at SECOReseval Mashup Platform Talk at SECO
Reseval Mashup Platform Talk at SECOMuhammad Imran
 
ResEval: Resource-oriented Research Impact Evaluation platform
ResEval: Resource-oriented Research Impact Evaluation platformResEval: Resource-oriented Research Impact Evaluation platform
ResEval: Resource-oriented Research Impact Evaluation platformMuhammad Imran
 

Más de Muhammad Imran (16)

Processing Social Media Messages in Mass Emergency: A Survey
Processing Social Media Messages in Mass Emergency: A SurveyProcessing Social Media Messages in Mass Emergency: A Survey
Processing Social Media Messages in Mass Emergency: A Survey
 
Damage Assessment from Social Media Imagery Data During Disasters
Damage Assessment from Social Media Imagery Data During DisastersDamage Assessment from Social Media Imagery Data During Disasters
Damage Assessment from Social Media Imagery Data During Disasters
 
Image4Act: Online Social Media Image Processing for Disaster Response
Image4Act: Online Social Media Image Processing for Disaster ResponseImage4Act: Online Social Media Image Processing for Disaster Response
Image4Act: Online Social Media Image Processing for Disaster Response
 
Real-Time Processing of Social Media Content for Social Good
Real-Time Processing of Social Media Content for Social GoodReal-Time Processing of Social Media Content for Social Good
Real-Time Processing of Social Media Content for Social Good
 
AIDR Tutorial (Artificial Intelligence for Disaster Response)
AIDR Tutorial (Artificial Intelligence for Disaster Response)AIDR Tutorial (Artificial Intelligence for Disaster Response)
AIDR Tutorial (Artificial Intelligence for Disaster Response)
 
A Robust Framework for Classifying Evolving Document Streams in an Expert-Mac...
A Robust Framework for Classifying Evolving Document Streams in an Expert-Mac...A Robust Framework for Classifying Evolving Document Streams in an Expert-Mac...
A Robust Framework for Classifying Evolving Document Streams in an Expert-Mac...
 
Summarizing Situational Tweets in Crisis Scenario
Summarizing Situational Tweets in Crisis ScenarioSummarizing Situational Tweets in Crisis Scenario
Summarizing Situational Tweets in Crisis Scenario
 
The Role of Social Media and Artificial Intelligence for Disaster Response
The Role of Social Media and Artificial Intelligence for Disaster ResponseThe Role of Social Media and Artificial Intelligence for Disaster Response
The Role of Social Media and Artificial Intelligence for Disaster Response
 
Introduction to Machine Learning: An Application to Disaster Response
Introduction to Machine Learning: An Application to Disaster ResponseIntroduction to Machine Learning: An Application to Disaster Response
Introduction to Machine Learning: An Application to Disaster Response
 
Artificial Intelligence for Disaster Response
Artificial Intelligence for Disaster ResponseArtificial Intelligence for Disaster Response
Artificial Intelligence for Disaster Response
 
Coordinating Human and Machine Intelligence to Classify Microblog Communica0o...
Coordinating Human and Machine Intelligence to Classify Microblog Communica0o...Coordinating Human and Machine Intelligence to Classify Microblog Communica0o...
Coordinating Human and Machine Intelligence to Classify Microblog Communica0o...
 
Tweet4act: Using Incident-Specific Profiles for Classifying Crisis-Related Me...
Tweet4act: Using Incident-Specific Profiles for Classifying Crisis-Related Me...Tweet4act: Using Incident-Specific Profiles for Classifying Crisis-Related Me...
Tweet4act: Using Incident-Specific Profiles for Classifying Crisis-Related Me...
 
Extracting Information Nuggets from Disaster-Related Messages in Social Media
Extracting Information Nuggets from Disaster-Related Messages in Social MediaExtracting Information Nuggets from Disaster-Related Messages in Social Media
Extracting Information Nuggets from Disaster-Related Messages in Social Media
 
Domain Specific Mashups
Domain Specific MashupsDomain Specific Mashups
Domain Specific Mashups
 
Reseval Mashup Platform Talk at SECO
Reseval Mashup Platform Talk at SECOReseval Mashup Platform Talk at SECO
Reseval Mashup Platform Talk at SECO
 
ResEval: Resource-oriented Research Impact Evaluation platform
ResEval: Resource-oriented Research Impact Evaluation platformResEval: Resource-oriented Research Impact Evaluation platform
ResEval: Resource-oriented Research Impact Evaluation platform
 

Último

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...amitlee9823
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 

Último (20)

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 

A Real-time Heuristic-based Unsupervised Method for Name Disambiguation in Digital Libraries

  • 1. A Real-time Heuristic based Name Disambiguation Method for Digital Libraries Muhammad Imran, Syed Zeeshan Haider Gillani, Maurizio Marchese
  • 2. Outline • Name Disambiguation problem • Mixed and Split Citations • Related work • Our approach • Experiments & results • Conclusion
  • 3. Name Disambiguation Author-1 Author-2 Author-3 Author-4 Muhammad Imran Multiple authors share same name Name variation-1 Name variation-2 Name variation-3 Muhammad Imran M. Imran Imran Muhammad One author with multiple name variations
  • 4. Name Disambiguation Types M. Imran Muhammad Imran Malik Imran Mehar Imran Mixed citations mixed citation records DL
  • 5. Name Disambiguation Types Muhammad Imran Author-1 Author-2 Author-3 Split citations split citations DL split citations split citations
  • 6. Related Work • Supervision approaches • Generative (naïve Bayes) • Discriminative (Support vector machines) • Labor-intensive, high training cost • Unsupervised approaches • Mostly failed to tackle name variations issue • No users interventions
  • 7. Our Contributions • An end-to-end system • Retrieval -> pre-processing -> disambiguation • A generic disambiguation approach • Unsupervised • Heuristics based • Involves Users’ feedback
  • 8. Our Approach Citation Records a cluster CR CR C R Cluster selection CR CR C R C R C R cp cp cp cp C R C R C R cp cp cp Citation records containing both mixed and split subset of citation records Discipline based clustering Co-author based split & building candidate principal authors' list Affiliation & candidate authors based merge C R C R c p c p Title & homepage based merge Principal cluster selection user selected CR pa user selected principal cluster CR p a title based vector titl e titl e list of candidate principal authors principal author Layer-3 Layer-4 Layer-2 Layer-1
  • 9. Hierarchical Clustering & Feature Representation • Approaches • Agglomerative Feature matrix (N x D) • Divisive Xi,j N (cols) = No. of citation records D (rows) = No. of features jth feature of ith citation record
  • 10. Features: co-authorship • Joint authors of a book, article … • Available across DLs • We use it as: • Principal author • Co-authors citation record {author-1, author-2, author-3, author-4, author-5} principal author co-authors
  • 11. Features: co-authorship • Heuristics “If a co-author appears in two different publications with a same principal author then most likely both publications belong to the principal author” citation record-1 {author-1, author-2, ...} author-2 THEN principal author-1 citation record-2 {author-1, author-2, ...} author-2 IF = = principal author-1
  • 12. Features: Conference Venue • Venue represents an event name e.g., a conference, workshop or a journal name. • Available across DLs. • Heuristics “The venues information of two researchers, having same names, can differentiate one from the other based on examining disciplines and sub-disciplines information of a researcher's interest.”
  • 13. Features: Author’s Affiliation • Author’s affiliation with an institute, university, organization etc. • Available across DLs. • Heuristics “If two publications with same principal author names, also share the same affiliation information then both publications will be considered as belongs to the same author.”
  • 14. Features: Authors Names • An author’s name can have multiple name variations. • For example: Muhammad Imran • M. Imran • Imran Muhammad • Muhammad. I
  • 15. Features: Publications titles • Title as a String literal • We maintain a vector of important keywords • Represents author’s interests • Similarity measure between a given citation records and the vector can be useful
  • 16. Features: Principal Author’s Homepage • Homepage is the URL of an author's homepage.
  • 17. Disambiguation System in Action • Inter-related disciplines based formation of clusters • Co-authors based split • Affiliation based agglomerative • Pursuit of the remaining bits
  • 18. Inter-related disciplines based formation of clusters • Exploits venue/discipline information • Forms relatively big clusters • Involves users and consider their selection among clusters
  • 19. Inter-related disciplines based formation of clusters • Inter-related disciplines based formation of clusters
  • 20. Co-author Based Split • Using k-means clustering
  • 21. Experiment & Evaluation Dataset • 50 most ambiguous researchers • Manually annotated a golden dataset • Used DBLP as a data source • Used ADANA as a base-line approach • Used Precision, Recall and F1 as performance measures
  • 23. Thank you! Muhammad Imran mimran@qf.org.qa