SlideShare una empresa de Scribd logo
1 de 32
Descargar para leer sin conexión
1/32
An Analysis of the Microsoft Academic Graph
Drahomira Herrmannova (@robodasha)
&
Petr Knoth (@petrknoth)
KMi, The Open University
2/32
Introduction
• To understand the strengths and limitations of
the Microsoft Academic Graph (MAG) for
applying it to scholarly communication tasks
• We study the characteristics of the dataset
and perform a correlation analysis with other
similar datasets
3/32
Questions
• How complete/sparse are the data?
• How many of the graph entities have all
associated metadata fields populated and how
reliable they are?
• How well are the data
conflated/disambiguated?
4/32
Dataset
• Heterogeneous graph comprised of more than
120 million publications and the related
authors, venues, organizations, and fields of
study
• The largest publicly available dataset of
scholarly publications
• The largest dataset of open citation data
5/32
Dataset size
Papers 126,909,021
Authors 114,698,044
Institutions 19,843
Journals 23,404
Conferences 1,283
Conference instances 50,202
Fields of study 50,266
6/32
External datasets used
• CORE (Connecting Repositories)
• Mendeley
• Webometrics Ranking of World Universities
• Scimago Journal and Country Rank
7/32
Publication age
8/32
Publication age
• Publication dates from MAG compared with
CORE and Mendeley data
• Intersection found using DOI
Unique DOIs in the MAG 35,569,305
Unique DOIs in CORE 2,673,592
Intersection MAG/CORE 1,690,668
Intersection MAG/CORE/Mendeley 1,314,854
Intersection without missing data 1,258,611
9/32
Publication age
• Compared using two methods
– Spearman's rho correlation coefficient
– Cumulative distribution function of the difference
between the publication years in the different
datasets
10/32
Publication age
Spearman’s rho MAG CORE Mendeley
MAG - 0.9555 0.9656
CORE 0.9555 - 0.9743
Mendeley 0.9656 0.9743 -
11/32
Publication age
12/32
Authors and affiliations
• Publications linked to author and affiliation
entities
• All publications linked to one or more authors,
however 105,980,107 (~83%) publications not
linked to any affiliation
13/32
Authors and affiliations
Mean number of authors per paper 2.66
Max authors per paper 6,530
Mean number of papers per author 2.94
Max number of papers per author 153,915
Mean number of collaborators 116.93
Max number of collaborators 3,661,912
Number of papers with affiliation 20,928,914
Mean number of affiliations per paper 0.23
Max number of affiliations per paper 181
14/32
Authors and affiliations
• Paper with most authors: ”Sunday, 26 August
2012"
• Author with most papers: ”united vertical
media gmbh"
15/32
Journals and conferences
• Papers linked to publication venues
• Of all papers in MAG (over 126 million), more
than 51 million (~40%) are linked to a journal
and 1,7 million to a conference entity
16/32
Fields of study
• FoS in MAG organised hierarchically into four
levels (0-3)
– 47,989 at level 3
– 1,966 at level 2
– 293 at level 1
– 18 at level 0
• Over 41 million papers are linked to one or
more fields of study (~33%)
17/32
Fields of study
18/32
Fields of study – Mendeley
19/32
Citation network
• We study the network by
– looking at the citation distribution, to see whether
it is consistent with previous studies
– Compare the citations received by two types of
entities in the graph with citations from external
datasets
• Why?
– To understand the quality of the citation data (not
to rank universities or journals)
20/32
Citation network
• 528,682,289 internal citations
• Significant portion of papers disconnected
from the graph
Total number of papers 126,909,021
Papers with zero references 96,850,699
Papers with zero citations 89,647,949
Papers with zero references and citations 80,166,717
Mean citation per paper 4.17
Mean citation per ”connected” paper 11.31
21/32
Citation network
• Comparison of university and journal citation
data found in MAG with the Ranking Web of
Universities (RWoU) and the Scimago Journal
& Country Rank (SJCR) citation data
• Two comparison methods
– Size of overlap of the top university/journal lists
– Pearson’s and Spearman’s correlation (calculated
on matching items)
22/32
Citation network
• Matched 1,255 universities between MAG and
RWoU (2,105 in total), and 13,050 journals
between MAG and SJRC (22,878 in total)
• 4 common journals in among the top 10
• 54 among the top 100
• 677 among the top 1000 and 1407 among the
top 2000
23/32
Citation network – top 10 universities
24/32
Citation network – top 10 journals
25/32
Citation network
• To quantify how much do the lists differ, we
created histograms of the differences between
the ranks in the MAG and in the external lists
• To produce the histograms
– Sorted the data by number of citations found in
the external dataset
– For top 100/1000 universities/journals created a
histogram of absolute difference between rank in
MAG and in external dataset
26/32
Rank difference – top 100 universities
27/32
Rank difference – top 100 universities
• University citation rank in the MAG differs by
more than 200 positions for about 20% of
universities in the top 100 of the Ranking Web
of Universities list
• The citation university rank differs by less than
25 positions for less than 40% of universities
across these two datasets
28/32
Rank difference – top 1000 universities
29/32
Rank difference – top 100 journals
30/32
Rank difference – top 1000 journals
31/32
Citation network
• Ranks of top universities differ on average by
163, with standard deviation of 185
• Ranks of top journals differ on average by
1,203 with standard deviation of 1,211
• Correlations calculated on matching items
Universities Journals
Pearson’s r 0.8773, p -> 0.0 0.8246, p -> 0.0
Spearman’s rho 0.8266, p -> 0.0 0.8973, p -> 0.0
32/32
Conclusions
• MAG data correlate well with external datasets
• We have identified certain limitations as to the
completeness of links from publications to other
entities
• Existing university and journal rankings
(proprietary data) produce substantially different
results
– MAG is open and transparent at the level of individual
citations, it is possible to verify and better interpret
the citation data
• Currently the most comprehensive publicly
available dataset of its kind

Más contenido relacionado

La actualidad más candente

Most borrowed is most cited? Library loan statistics as a proxy for monograph...
Most borrowed is most cited? Library loan statistics as a proxy for monograph...Most borrowed is most cited? Library loan statistics as a proxy for monograph...
Most borrowed is most cited? Library loan statistics as a proxy for monograph...Torres Salinas
 
Scopus harvestering trumpeting
Scopus harvestering trumpetingScopus harvestering trumpeting
Scopus harvestering trumpetingJoanne Paterson
 
Bibliometrics in the library, putting science in to practice
Bibliometrics in the library, putting science in to practiceBibliometrics in the library, putting science in to practice
Bibliometrics in the library, putting science in to practiceWouter Gerritsma
 
ICPSR Data Services
ICPSR Data ServicesICPSR Data Services
ICPSR Data ServicesICPSR
 
Learning the ABCs of Tracking APCs
Learning the ABCs of Tracking APCsLearning the ABCs of Tracking APCs
Learning the ABCs of Tracking APCsErin Calhoun
 
Most borrowed is most cited? Library loan statistics as a proxy for monograph...
Most borrowed is most cited? Library loan statistics as a proxy for monograph...Most borrowed is most cited? Library loan statistics as a proxy for monograph...
Most borrowed is most cited? Library loan statistics as a proxy for monograph...Nicolas Robinson-Garcia
 
improving student and researcher relations with the library
improving student and researcher relations with the libraryimproving student and researcher relations with the library
improving student and researcher relations with the libraryWouter Gerritsma
 
Scientometric approaches to classification
Scientometric approaches to classificationScientometric approaches to classification
Scientometric approaches to classificationNees Jan van Eck
 
Let's Talk Research 2015 - Mary Hill - What have librarians ever done for us?
Let's Talk Research 2015 - Mary Hill - What have librarians ever done for us? Let's Talk Research 2015 - Mary Hill - What have librarians ever done for us?
Let's Talk Research 2015 - Mary Hill - What have librarians ever done for us? NHSNWRD
 
Citation Analysis of Higher Education Texts in Selected Databases: A Comparis...
Citation Analysis of Higher Education Texts in Selected Databases: A Comparis...Citation Analysis of Higher Education Texts in Selected Databases: A Comparis...
Citation Analysis of Higher Education Texts in Selected Databases: A Comparis...Che-Wei Lee
 
Web resources for thesis work
Web resources for thesis workWeb resources for thesis work
Web resources for thesis workMichael Le Duc
 
Accuracy of citation data in Web of Science and Scopus
Accuracy of citation data in Web of Science and ScopusAccuracy of citation data in Web of Science and Scopus
Accuracy of citation data in Web of Science and ScopusNees Jan van Eck
 
Cabell's Directory - Features NATIONAL FORUM JOURNALS, www.nationalforum.com,...
Cabell's Directory - Features NATIONAL FORUM JOURNALS, www.nationalforum.com,...Cabell's Directory - Features NATIONAL FORUM JOURNALS, www.nationalforum.com,...
Cabell's Directory - Features NATIONAL FORUM JOURNALS, www.nationalforum.com,...William Kritsonis
 
Serving the Biomedical Research Community
Serving the Biomedical Research CommunityServing the Biomedical Research Community
Serving the Biomedical Research CommunityMelissa Rethlefsen
 
Bibliometric analyses on repository contents for the evaluation of research a...
Bibliometric analyses on repository contents for the evaluation of research a...Bibliometric analyses on repository contents for the evaluation of research a...
Bibliometric analyses on repository contents for the evaluation of research a...marco.vanveller
 
What does it take to have precise indicators?
What does it take to have precise indicators?What does it take to have precise indicators?
What does it take to have precise indicators?Held de Souza
 
library resources for optometrists
library resources for optometristslibrary resources for optometrists
library resources for optometristsHossein Mirzaie
 

La actualidad más candente (20)

Most borrowed is most cited? Library loan statistics as a proxy for monograph...
Most borrowed is most cited? Library loan statistics as a proxy for monograph...Most borrowed is most cited? Library loan statistics as a proxy for monograph...
Most borrowed is most cited? Library loan statistics as a proxy for monograph...
 
Scopus harvestering trumpeting
Scopus harvestering trumpetingScopus harvestering trumpeting
Scopus harvestering trumpeting
 
Bibliometrics in the library, putting science in to practice
Bibliometrics in the library, putting science in to practiceBibliometrics in the library, putting science in to practice
Bibliometrics in the library, putting science in to practice
 
ICPSR Data Services
ICPSR Data ServicesICPSR Data Services
ICPSR Data Services
 
Learning the ABCs of Tracking APCs
Learning the ABCs of Tracking APCsLearning the ABCs of Tracking APCs
Learning the ABCs of Tracking APCs
 
Most borrowed is most cited? Library loan statistics as a proxy for monograph...
Most borrowed is most cited? Library loan statistics as a proxy for monograph...Most borrowed is most cited? Library loan statistics as a proxy for monograph...
Most borrowed is most cited? Library loan statistics as a proxy for monograph...
 
Disentangling gold open access
Disentangling gold open accessDisentangling gold open access
Disentangling gold open access
 
improving student and researcher relations with the library
improving student and researcher relations with the libraryimproving student and researcher relations with the library
improving student and researcher relations with the library
 
Scientometric approaches to classification
Scientometric approaches to classificationScientometric approaches to classification
Scientometric approaches to classification
 
Let's Talk Research 2015 - Mary Hill - What have librarians ever done for us?
Let's Talk Research 2015 - Mary Hill - What have librarians ever done for us? Let's Talk Research 2015 - Mary Hill - What have librarians ever done for us?
Let's Talk Research 2015 - Mary Hill - What have librarians ever done for us?
 
Citation Analysis of Higher Education Texts in Selected Databases: A Comparis...
Citation Analysis of Higher Education Texts in Selected Databases: A Comparis...Citation Analysis of Higher Education Texts in Selected Databases: A Comparis...
Citation Analysis of Higher Education Texts in Selected Databases: A Comparis...
 
Web resources for thesis work
Web resources for thesis workWeb resources for thesis work
Web resources for thesis work
 
Accuracy of citation data in Web of Science and Scopus
Accuracy of citation data in Web of Science and ScopusAccuracy of citation data in Web of Science and Scopus
Accuracy of citation data in Web of Science and Scopus
 
Cabell's Directory - Features NATIONAL FORUM JOURNALS, www.nationalforum.com,...
Cabell's Directory - Features NATIONAL FORUM JOURNALS, www.nationalforum.com,...Cabell's Directory - Features NATIONAL FORUM JOURNALS, www.nationalforum.com,...
Cabell's Directory - Features NATIONAL FORUM JOURNALS, www.nationalforum.com,...
 
Serving the Biomedical Research Community
Serving the Biomedical Research CommunityServing the Biomedical Research Community
Serving the Biomedical Research Community
 
Bibliometric analyses on repository contents for the evaluation of research a...
Bibliometric analyses on repository contents for the evaluation of research a...Bibliometric analyses on repository contents for the evaluation of research a...
Bibliometric analyses on repository contents for the evaluation of research a...
 
Practical applications of altmetrics
Practical applications of altmetricsPractical applications of altmetrics
Practical applications of altmetrics
 
Liber2011
Liber2011Liber2011
Liber2011
 
What does it take to have precise indicators?
What does it take to have precise indicators?What does it take to have precise indicators?
What does it take to have precise indicators?
 
library resources for optometrists
library resources for optometristslibrary resources for optometrists
library resources for optometrists
 

Similar a An Analysis of the Microsoft Academic Graph

Identifying Twitter audiences: Who is tweeting about scientific papers?
Identifying Twitter audiences: Who is tweeting about scientific papers?Identifying Twitter audiences: Who is tweeting about scientific papers?
Identifying Twitter audiences: Who is tweeting about scientific papers?Stefanie Haustein
 
UKSG 2024 Plenary 2 - What did we Read, What did we Publish: Distilling the d...
UKSG 2024 Plenary 2 - What did we Read, What did we Publish: Distilling the d...UKSG 2024 Plenary 2 - What did we Read, What did we Publish: Distilling the d...
UKSG 2024 Plenary 2 - What did we Read, What did we Publish: Distilling the d...UKSG: connecting the knowledge community
 
Non-textual ranking in Digital Libraries
Non-textual ranking in Digital LibrariesNon-textual ranking in Digital Libraries
Non-textual ranking in Digital LibrariesGESIS
 
Scopus:Workshops on Scopus for Literature Searching and Research Impact
Scopus:Workshops on Scopus for Literature Searching and Research ImpactScopus:Workshops on Scopus for Literature Searching and Research Impact
Scopus:Workshops on Scopus for Literature Searching and Research Impactmotqin
 
how to publish a paper-1.ppt
how to publish a paper-1.ppthow to publish a paper-1.ppt
how to publish a paper-1.pptAlexmoradi
 
Research Impact in Specialized Settings: 3 Case Studies
Research Impact in Specialized Settings: 3 Case StudiesResearch Impact in Specialized Settings: 3 Case Studies
Research Impact in Specialized Settings: 3 Case StudiesElaine Lasda
 
Public engagement while you sleep
Public engagement while you sleepPublic engagement while you sleep
Public engagement while you sleepUoLResearchSupport
 
Scopus: a changing world of Research
Scopus: a changing world of ResearchScopus: a changing world of Research
Scopus: a changing world of ResearchCiarán Quinn
 
Public engagement while you sleep? How altmetrics can help researchers broade...
Public engagement while you sleep? How altmetrics can help researchers broade...Public engagement while you sleep? How altmetrics can help researchers broade...
Public engagement while you sleep? How altmetrics can help researchers broade...UoLResearchSupport
 
Public engagement while you sleep
Public engagement while you sleep Public engagement while you sleep
Public engagement while you sleep Kirsten Thompson
 
Making Sense of the Confusing World of Research Information Management
Making Sense of the Confusing World of Research Information ManagementMaking Sense of the Confusing World of Research Information Management
Making Sense of the Confusing World of Research Information ManagementOCLC
 
Research workshop presentation unisa
Research workshop presentation unisaResearch workshop presentation unisa
Research workshop presentation unisaerasmus01
 
Finding research evidence 2016
Finding research evidence 2016 Finding research evidence 2016
Finding research evidence 2016 John Iona
 
Where to publish_130709
Where to publish_130709Where to publish_130709
Where to publish_130709opl10
 

Similar a An Analysis of the Microsoft Academic Graph (20)

Identifying Twitter audiences: Who is tweeting about scientific papers?
Identifying Twitter audiences: Who is tweeting about scientific papers?Identifying Twitter audiences: Who is tweeting about scientific papers?
Identifying Twitter audiences: Who is tweeting about scientific papers?
 
UKSG 2024 Plenary 2 - What did we Read, What did we Publish: Distilling the d...
UKSG 2024 Plenary 2 - What did we Read, What did we Publish: Distilling the d...UKSG 2024 Plenary 2 - What did we Read, What did we Publish: Distilling the d...
UKSG 2024 Plenary 2 - What did we Read, What did we Publish: Distilling the d...
 
Non-textual ranking in Digital Libraries
Non-textual ranking in Digital LibrariesNon-textual ranking in Digital Libraries
Non-textual ranking in Digital Libraries
 
Hgm elpub2018
Hgm elpub2018Hgm elpub2018
Hgm elpub2018
 
Tr georgia 05 2010
Tr georgia 05 2010Tr georgia 05 2010
Tr georgia 05 2010
 
Scopus:Workshops on Scopus for Literature Searching and Research Impact
Scopus:Workshops on Scopus for Literature Searching and Research ImpactScopus:Workshops on Scopus for Literature Searching and Research Impact
Scopus:Workshops on Scopus for Literature Searching and Research Impact
 
Scopus
ScopusScopus
Scopus
 
how to publish a paper-1.ppt
how to publish a paper-1.ppthow to publish a paper-1.ppt
how to publish a paper-1.ppt
 
Research Impact in Specialized Settings: 3 Case Studies
Research Impact in Specialized Settings: 3 Case StudiesResearch Impact in Specialized Settings: 3 Case Studies
Research Impact in Specialized Settings: 3 Case Studies
 
2016 AAUDE
2016 AAUDE2016 AAUDE
2016 AAUDE
 
Public engagement while you sleep
Public engagement while you sleepPublic engagement while you sleep
Public engagement while you sleep
 
Scopus: a changing world of Research
Scopus: a changing world of ResearchScopus: a changing world of Research
Scopus: a changing world of Research
 
Public engagement while you sleep? How altmetrics can help researchers broade...
Public engagement while you sleep? How altmetrics can help researchers broade...Public engagement while you sleep? How altmetrics can help researchers broade...
Public engagement while you sleep? How altmetrics can help researchers broade...
 
Public engagement while you sleep
Public engagement while you sleep Public engagement while you sleep
Public engagement while you sleep
 
Bryant Confusing World of RIM
Bryant Confusing World of RIM Bryant Confusing World of RIM
Bryant Confusing World of RIM
 
Making Sense of the Confusing World of Research Information Management
Making Sense of the Confusing World of Research Information ManagementMaking Sense of the Confusing World of Research Information Management
Making Sense of the Confusing World of Research Information Management
 
Research workshop presentation unisa
Research workshop presentation unisaResearch workshop presentation unisa
Research workshop presentation unisa
 
InCites
InCitesInCites
InCites
 
Finding research evidence 2016
Finding research evidence 2016 Finding research evidence 2016
Finding research evidence 2016
 
Where to publish_130709
Where to publish_130709Where to publish_130709
Where to publish_130709
 

Más de Dasha Herrmannova

Machine Learning for Data Extraction
Machine Learning for Data ExtractionMachine Learning for Data Extraction
Machine Learning for Data ExtractionDasha Herrmannova
 
Do Authors Deposit on Time? Tracking Open Access Policy Compliance
Do Authors Deposit on Time? Tracking Open Access Policy ComplianceDo Authors Deposit on Time? Tracking Open Access Policy Compliance
Do Authors Deposit on Time? Tracking Open Access Policy ComplianceDasha Herrmannova
 
Semantometrics: Text Analysis in Research Evaluation
Semantometrics: Text Analysis in Research Evaluation Semantometrics: Text Analysis in Research Evaluation
Semantometrics: Text Analysis in Research Evaluation Dasha Herrmannova
 
Do Citations and Readership Predict Excellent Publications?
Do Citations and Readership Predict Excellent Publications?Do Citations and Readership Predict Excellent Publications?
Do Citations and Readership Predict Excellent Publications?Dasha Herrmannova
 
Visual Search for Supporting Content Exploration in Large Document Collections
Visual Search for Supporting Content Exploration in Large Document CollectionsVisual Search for Supporting Content Exploration in Large Document Collections
Visual Search for Supporting Content Exploration in Large Document CollectionsDasha Herrmannova
 
Unsupervised Identification of Study Descriptors in Toxicology Research: An E...
Unsupervised Identification of Study Descriptors in Toxicology Research: An E...Unsupervised Identification of Study Descriptors in Toxicology Research: An E...
Unsupervised Identification of Study Descriptors in Toxicology Research: An E...Dasha Herrmannova
 
Simple Yet Effective Methods for Large-Scale Scholarly Publication Ranking
Simple Yet Effective Methods for Large-Scale Scholarly Publication RankingSimple Yet Effective Methods for Large-Scale Scholarly Publication Ranking
Simple Yet Effective Methods for Large-Scale Scholarly Publication RankingDasha Herrmannova
 
Semantometrics in Coauthorship Networks: Fulltext-based Approach for Analysin...
Semantometrics in Coauthorship Networks: Fulltext-based Approach for Analysin...Semantometrics in Coauthorship Networks: Fulltext-based Approach for Analysin...
Semantometrics in Coauthorship Networks: Fulltext-based Approach for Analysin...Dasha Herrmannova
 
Towards Semantometrics: A New Semantic Similarity Based Measure for Assessing...
Towards Semantometrics: A New Semantic Similarity Based Measure for Assessing...Towards Semantometrics: A New Semantic Similarity Based Measure for Assessing...
Towards Semantometrics: A New Semantic Similarity Based Measure for Assessing...Dasha Herrmannova
 
Mining Research Publication Networks for Impact -- KMi Internal Seminar
Mining Research Publication Networks for Impact -- KMi Internal SeminarMining Research Publication Networks for Impact -- KMi Internal Seminar
Mining Research Publication Networks for Impact -- KMi Internal SeminarDasha Herrmannova
 

Más de Dasha Herrmannova (10)

Machine Learning for Data Extraction
Machine Learning for Data ExtractionMachine Learning for Data Extraction
Machine Learning for Data Extraction
 
Do Authors Deposit on Time? Tracking Open Access Policy Compliance
Do Authors Deposit on Time? Tracking Open Access Policy ComplianceDo Authors Deposit on Time? Tracking Open Access Policy Compliance
Do Authors Deposit on Time? Tracking Open Access Policy Compliance
 
Semantometrics: Text Analysis in Research Evaluation
Semantometrics: Text Analysis in Research Evaluation Semantometrics: Text Analysis in Research Evaluation
Semantometrics: Text Analysis in Research Evaluation
 
Do Citations and Readership Predict Excellent Publications?
Do Citations and Readership Predict Excellent Publications?Do Citations and Readership Predict Excellent Publications?
Do Citations and Readership Predict Excellent Publications?
 
Visual Search for Supporting Content Exploration in Large Document Collections
Visual Search for Supporting Content Exploration in Large Document CollectionsVisual Search for Supporting Content Exploration in Large Document Collections
Visual Search for Supporting Content Exploration in Large Document Collections
 
Unsupervised Identification of Study Descriptors in Toxicology Research: An E...
Unsupervised Identification of Study Descriptors in Toxicology Research: An E...Unsupervised Identification of Study Descriptors in Toxicology Research: An E...
Unsupervised Identification of Study Descriptors in Toxicology Research: An E...
 
Simple Yet Effective Methods for Large-Scale Scholarly Publication Ranking
Simple Yet Effective Methods for Large-Scale Scholarly Publication RankingSimple Yet Effective Methods for Large-Scale Scholarly Publication Ranking
Simple Yet Effective Methods for Large-Scale Scholarly Publication Ranking
 
Semantometrics in Coauthorship Networks: Fulltext-based Approach for Analysin...
Semantometrics in Coauthorship Networks: Fulltext-based Approach for Analysin...Semantometrics in Coauthorship Networks: Fulltext-based Approach for Analysin...
Semantometrics in Coauthorship Networks: Fulltext-based Approach for Analysin...
 
Towards Semantometrics: A New Semantic Similarity Based Measure for Assessing...
Towards Semantometrics: A New Semantic Similarity Based Measure for Assessing...Towards Semantometrics: A New Semantic Similarity Based Measure for Assessing...
Towards Semantometrics: A New Semantic Similarity Based Measure for Assessing...
 
Mining Research Publication Networks for Impact -- KMi Internal Seminar
Mining Research Publication Networks for Impact -- KMi Internal SeminarMining Research Publication Networks for Impact -- KMi Internal Seminar
Mining Research Publication Networks for Impact -- KMi Internal Seminar
 

Último

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Último (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

An Analysis of the Microsoft Academic Graph

  • 1. 1/32 An Analysis of the Microsoft Academic Graph Drahomira Herrmannova (@robodasha) & Petr Knoth (@petrknoth) KMi, The Open University
  • 2. 2/32 Introduction • To understand the strengths and limitations of the Microsoft Academic Graph (MAG) for applying it to scholarly communication tasks • We study the characteristics of the dataset and perform a correlation analysis with other similar datasets
  • 3. 3/32 Questions • How complete/sparse are the data? • How many of the graph entities have all associated metadata fields populated and how reliable they are? • How well are the data conflated/disambiguated?
  • 4. 4/32 Dataset • Heterogeneous graph comprised of more than 120 million publications and the related authors, venues, organizations, and fields of study • The largest publicly available dataset of scholarly publications • The largest dataset of open citation data
  • 5. 5/32 Dataset size Papers 126,909,021 Authors 114,698,044 Institutions 19,843 Journals 23,404 Conferences 1,283 Conference instances 50,202 Fields of study 50,266
  • 6. 6/32 External datasets used • CORE (Connecting Repositories) • Mendeley • Webometrics Ranking of World Universities • Scimago Journal and Country Rank
  • 8. 8/32 Publication age • Publication dates from MAG compared with CORE and Mendeley data • Intersection found using DOI Unique DOIs in the MAG 35,569,305 Unique DOIs in CORE 2,673,592 Intersection MAG/CORE 1,690,668 Intersection MAG/CORE/Mendeley 1,314,854 Intersection without missing data 1,258,611
  • 9. 9/32 Publication age • Compared using two methods – Spearman's rho correlation coefficient – Cumulative distribution function of the difference between the publication years in the different datasets
  • 10. 10/32 Publication age Spearman’s rho MAG CORE Mendeley MAG - 0.9555 0.9656 CORE 0.9555 - 0.9743 Mendeley 0.9656 0.9743 -
  • 12. 12/32 Authors and affiliations • Publications linked to author and affiliation entities • All publications linked to one or more authors, however 105,980,107 (~83%) publications not linked to any affiliation
  • 13. 13/32 Authors and affiliations Mean number of authors per paper 2.66 Max authors per paper 6,530 Mean number of papers per author 2.94 Max number of papers per author 153,915 Mean number of collaborators 116.93 Max number of collaborators 3,661,912 Number of papers with affiliation 20,928,914 Mean number of affiliations per paper 0.23 Max number of affiliations per paper 181
  • 14. 14/32 Authors and affiliations • Paper with most authors: ”Sunday, 26 August 2012" • Author with most papers: ”united vertical media gmbh"
  • 15. 15/32 Journals and conferences • Papers linked to publication venues • Of all papers in MAG (over 126 million), more than 51 million (~40%) are linked to a journal and 1,7 million to a conference entity
  • 16. 16/32 Fields of study • FoS in MAG organised hierarchically into four levels (0-3) – 47,989 at level 3 – 1,966 at level 2 – 293 at level 1 – 18 at level 0 • Over 41 million papers are linked to one or more fields of study (~33%)
  • 18. 18/32 Fields of study – Mendeley
  • 19. 19/32 Citation network • We study the network by – looking at the citation distribution, to see whether it is consistent with previous studies – Compare the citations received by two types of entities in the graph with citations from external datasets • Why? – To understand the quality of the citation data (not to rank universities or journals)
  • 20. 20/32 Citation network • 528,682,289 internal citations • Significant portion of papers disconnected from the graph Total number of papers 126,909,021 Papers with zero references 96,850,699 Papers with zero citations 89,647,949 Papers with zero references and citations 80,166,717 Mean citation per paper 4.17 Mean citation per ”connected” paper 11.31
  • 21. 21/32 Citation network • Comparison of university and journal citation data found in MAG with the Ranking Web of Universities (RWoU) and the Scimago Journal & Country Rank (SJCR) citation data • Two comparison methods – Size of overlap of the top university/journal lists – Pearson’s and Spearman’s correlation (calculated on matching items)
  • 22. 22/32 Citation network • Matched 1,255 universities between MAG and RWoU (2,105 in total), and 13,050 journals between MAG and SJRC (22,878 in total) • 4 common journals in among the top 10 • 54 among the top 100 • 677 among the top 1000 and 1407 among the top 2000
  • 23. 23/32 Citation network – top 10 universities
  • 24. 24/32 Citation network – top 10 journals
  • 25. 25/32 Citation network • To quantify how much do the lists differ, we created histograms of the differences between the ranks in the MAG and in the external lists • To produce the histograms – Sorted the data by number of citations found in the external dataset – For top 100/1000 universities/journals created a histogram of absolute difference between rank in MAG and in external dataset
  • 26. 26/32 Rank difference – top 100 universities
  • 27. 27/32 Rank difference – top 100 universities • University citation rank in the MAG differs by more than 200 positions for about 20% of universities in the top 100 of the Ranking Web of Universities list • The citation university rank differs by less than 25 positions for less than 40% of universities across these two datasets
  • 28. 28/32 Rank difference – top 1000 universities
  • 29. 29/32 Rank difference – top 100 journals
  • 30. 30/32 Rank difference – top 1000 journals
  • 31. 31/32 Citation network • Ranks of top universities differ on average by 163, with standard deviation of 185 • Ranks of top journals differ on average by 1,203 with standard deviation of 1,211 • Correlations calculated on matching items Universities Journals Pearson’s r 0.8773, p -> 0.0 0.8246, p -> 0.0 Spearman’s rho 0.8266, p -> 0.0 0.8973, p -> 0.0
  • 32. 32/32 Conclusions • MAG data correlate well with external datasets • We have identified certain limitations as to the completeness of links from publications to other entities • Existing university and journal rankings (proprietary data) produce substantially different results – MAG is open and transparent at the level of individual citations, it is possible to verify and better interpret the citation data • Currently the most comprehensive publicly available dataset of its kind