SlideShare una empresa de Scribd logo
1 de 37
Descargar para leer sin conexión
Xiaodan Zhu and PeterTurney
National Research Council Canada
Daniel Lemire
TELUQ, Université du Québec Montréal
AndreVellino
School of Information Studies,
University of Ottawa, Ottawa
Measuring Academic Influence:
Not All Citations Are Equal
Overview
—  Some background in CitationAnalysis
—  What we tried to do and why
—  How we did it
—  What the results were
—  What the implications are
What is Citation Analysis
Citation analysis refers to the collection of methods for measuring
the importance of scholars, journals and institutions by counting
citations in a graph of references in the published literature.
…
…
… …
…
…
Why Do Citation Analysis?
—  Reason # 1: Because it generates measurable quantities!
“Since we can’t really measure what
interests us, we begin to be interested
in what we can measure”
JoelWestheimer
Professor of Education
University of Ottawa
Uses for Citation Measures
—  For Readers
—  To evaluate the quality of articles / journals
—  For Universities
—  To evaluate the productivity of academics
—  To help in tenure and promotion decisions
—  For Journals
—  To attract authors to publish
—  For Libraries
—  To make collections / acquisition decisions
—  To make automated recommendations to users
How Are Citations Counted?
—  Add 1 for every new occurrence of a cited article
—  Sum the results
—  Average per article & / or CountTotal # of citations
Problems
—  Self citations!
—  No measure of quality of citing source
—  May be skewed by a small number of highly cited items
—  Easy to “game” by tricking Google Scholar
—  viz. Ike Inktare h-index = 94 – Einstein h-index = 84
h-index
—  Jorge Hirsch (PNAS, 2005) defined the h-index:
—  Attempts to measure both the productivity and impact of the
author’s published work
—  An author has index h if h of their N papers have at least h citations
each, and the other (N − h) papers have at most h citations each.
Some Criticisms of the h-index
—  The h-index does not account for the number of authors or the order of
the authors of a paper.
—  Cannot use the h-index to compare authors in different fields
—  Young researchers with as yet short careers are at a built-in disadvantage
over older researchers
—  Constrained by the total number of publications
—  10 papers each w/ 100 citations each = 10 papers w/ 10 citation each
“[h-index] captures a small amount of information
about the distribution of a scientist's citations [and] loses crucial
information that is essential for the assessment of research.” 
Adler, R., Ewing, J.Taylor, P. Citation statistics.
A report from the International Mathematical Union.
http://www.mathunion.org/fileadmin/IMU/Report/CitationStatistics.pdf
Journal Impact factor (IF)
—  Invented by Eugene Garfield in 1955 to identify journals for
Science Citation Index
—  Definition:
Total Citations (2 preceding years )
Total Articles (2 preceding years )
=JIF
i.e. the impact factor of a journal is the average number
of citations to those papers that were published during
the two preceding years
¨  e.g. the number of times articles published in 2001 and 2002
were cited by indexed journals during 2003 / the total number
of items published in 2001 and 2002
Some Criticisms of Impact Factor
—  Letters or editorials in some journals (e.g. Nature) are often cited
(and counted) in “Total Citations” (numerator) but not in “Total
Articles”
—  2-year window not applicable in many fields (e.g. in Math 90% of
citations fall outside the 2-year window)
—  IF varies considerably across disciplines (Math has an average of
0.9 citation per article, Life Sciences have an average of 6.2)
“Using the impact factor alone to judge a journal is
like using weight alone to judge a person's health.” 
Adler, R., Ewing, J.Taylor, P. Citation statistics.
A report from the International Mathematical Union.
http://www.mathunion.org/fileadmin/IMU/Report/CitationStatistics.pdf
What We Did and Why
—  As early as 1965 Garfield identified 15 different reasons for
citing
—  giving credit for related work
—  correcting a work
—  criticizing previous work
—  Many attempts since to categorize citations
One Big Assumption
All citations should count equally!
Citation Typing Ontology (CiTO)
Here are first 21 of the 91 citation types in CiTO
http://imageweb.zoo.ox.ac.uk/pub/2008/plospaper/latest/#refs
Example of semantically annotated article using CiTO:
Our Objective
—  Solve a binary classification problem:
Given a Paper-Reference (P-R) pair, does
P-R belong to the class “R is highly
influential for P” or not.
Our Method
—  Apply Machine Learning methods to train a computer to
recognize “Highly Influential Reference” from examples
Step 1 – Data Collection
We believe that most papers are based on 1, 2, 3 or 4
essential references. By an essential reference, we mean a
reference that was highly influential or inspirational for the
core ideas in your paper; that is, a reference that inspired or
strongly influenced your new algorithm, your experimental
design, or your choice of a research problem. Other
references merely support the work.
We asked for
—  Title of your paper (research papers only; no surveys)
—  The essential references does your paper build?
We got
—  100 papers
—  322 “influential” references
—  i.e. 3.2 “influential references” per article
—  Each paper
—  Contained ~ 31 references in the References section
—  Cited ~ 54 references in the body of the paper
—  i.e. each reverence was cited an average of 1.7 times per paper
The Problem
—  The 100 papers yield 3143 paper-reference pairs
—  The authors have selected ~320 paper-reference pairs
—  Algorithmically: to accurately select those 320 from the 3142
Paper – Reference Analysis
—  OpenNLP used to detect sentence boundaries and tokenize.
—  ParsCit to parse the papers.
—  ParsCit is an open-source package for parsing references and
document structure in scientific papers.
—  Regular expressions to capture citation occurrences in paper
bodies that were not detected by ParsCit.
Characteristics of Corpus
We Looked at 5 Classes of Features
1.  Count-based features
2.  Similarity-based features
3.  Context-based features
4.  Position-based features
5.  Miscellaneous features
Count Based Features
—  Total number of times a paper is referenced in the citing paper
—  The number of different sections in which a given reference appears
—  Number of times a paper is referenced in the
—  “Related” section
—  “Introduction” section
—  “Core” sections (all sections excluding “Related”,“Introduction”,
“Acknowledgements”,“Conclusion” and “FutureWork”
—  The number of different sections in which a reference appears
Content-Similarity Based Features
Citing article Referenced articles
Title-Title
Title-Abstract
Title-Conclusion
Title-Introduction
Title-Core
Citing Context
—  When an article is cited, the linguistic context in which the
article is cited is considered as saying something about the
cited article.
e.g.
“Like Moravcsik and Murugesan (1975),we are concerned
about the side effects of counting insignificant references”
Context-Similarity Based Features
CitingArticle
Title Abstract Introduction Conclusion
Other Context Based Features
—  Authors explicitly mentioned in citation context?
—  Citation alone [4] or with others [3,4,5]
—  If “with others” is it first? (e.g.“[3]” is first in “[3,4,5]”)
Using pre-defined word-lists, is the lexical content of a citation
—  “relevant” [likewise,influential,inspiring useful….]
—  “new” [recently,latest,current,improved…]
—  “extreme” [greatly,intensely,acutely,almighty,awfully]
—  “comparative” [easy,easier,easiest,strong,stronger…]
Lexical Context Features
Using a lexicon of 114,271 words obtained from the General
Inquirer Lexicon (11,788 words) extended w/Wordnet +
Turney and LittmanAlgorithm,
—  Count the number of words labeled
—  “Strong”
—  “Positive”
—  “Evaluative”
Also, sentiment analysis with a different lexicon gave us
—  Presence / absence of “Emotion” (Joy, Sadness,Anger, Fear, etc.)
—  “Positive” / “Negative”
Position Based Features
Where does the citation occur?
—  Citation appears at the beginning of a sentence? (Y/ N)
—  Citation appears at the end of a sentence? (Y/N)
—  Where are the sentence(s) in which the citation(s) occur(s)
e.g.
—  0 (First sentence) to 1 (Last sentence)
—  distance from the mean of occurrences of all citations
Count Based
Features
Similarity Based
Features
Context Based
Features
Position Based
Features
Misc. Features
Top 7 Features: 4 “counts”, 3 “similarity”
Counts in Paper
Counts in Sections
Counts in Core Section
Title-Abstract Similarity
Counts in Intro Section
Title-Core Similarity
Title-Intro Similarity
Conventional Measures on Citation Graph
…
…
… …
…
…
C
R
1
Influence Primed Measures
…
…
… …
…
…
C
RX
where X = (number of times C cites R)2
hip-index
—  Each occurrence of a citation of paper R by paper C = 1
—  hip-index (h-influence-primed) index for an author is the
largest number h such that at least h of the author's papers
have an influence-primed citation count of at least h.
Examples
hip-index = 5
h-index = 2
cited 3 times by C1 = 9
cited 2 times by C2 = 4
cited 2 times by C3 = 4
cited 2 times by C4 = 4
R3 – cited 3 times by C5 = 9
R4 – cited 3 times by C6 = 9
R5 – cited 3 times by C7 = 9
R6 – cited 2 times by C8 = 4
R7 – cited 1 times by C9 = 1
13
8
9
9
9
4
1
hip-index = 3
h-index = 2
cited 2 times by C1 = 4
cited 1 times by C2 = 1
cited 2 times by C3 = 4
cited 1 times by C4 = 1
R3 – cited 2 times by C5 = 4
R4 – cited 1 times by C6 = 1
R5 – cited 1 times by C7 = 1
R6 – cited 1 times by C8 = 1
R7 – cited 1 times by C9 = 1
5
5
4
1
1
1
1
R1
R2
R1
R2
Using hip-index to Predict ACM Fellows
—  Used the citation network constructed from
—  ~ 20,000 papers in theAssociation for Computational Linguistics
Anthology
—  Calculated the h-index ofACL Fellows
—  Calculated the hip-index ofACL Fellows
—  Compared the precision of h-index and hip-index
—  the number ofACL Fellows in the top N divided by N
1/2
2/3
1/4
2/6
3/10
3/9
4/11
4/10
5/11
5/12
Conclusions
—  We can throw away h-index and Impact Factor etc. completely
OR we can try to improve them by counting citations more
relevantly
—  A measure of academic influence for a citation is possible and
—  It is easy to compute to a first approximation – merely count
their frequency
—  Apply the influence-primed weights on citation graphs to
compute
—  Influence-primed Impact Factor, g-index etc.
Thanks!

Más contenido relacionado

La actualidad más candente

A model for handling overloading of literature review process for social science
A model for handling overloading of literature review process for social scienceA model for handling overloading of literature review process for social science
A model for handling overloading of literature review process for social scienceSalam Shah
 
Apa style guide
Apa style guideApa style guide
Apa style guide3objim
 
How to write and publish an article in a reputable international journal
How to write and publish an article in a reputable international journalHow to write and publish an article in a reputable international journal
How to write and publish an article in a reputable international journalDr. Vivencio (Ven) Ballano
 
bibliometrics for beginners
bibliometrics for beginnersbibliometrics for beginners
bibliometrics for beginnersRachel Henderson
 
Bibliographic coupling
Bibliographic couplingBibliographic coupling
Bibliographic couplingRitesh Tiwari
 
Publishing in high impact factor journals
Publishing in high impact factor journalsPublishing in high impact factor journals
Publishing in high impact factor journalsMohamed Alrshah
 
Citation impact: an introduction to bibliometrics for researchers
Citation impact: an introduction to bibliometrics for researchersCitation impact: an introduction to bibliometrics for researchers
Citation impact: an introduction to bibliometrics for researchersucclibrarybibliometrics
 
Research project guidelines by nmims
Research project guidelines by nmimsResearch project guidelines by nmims
Research project guidelines by nmimsHarshita Wankhedkar
 
Types of Articles
Types of ArticlesTypes of Articles
Types of Articlesrobinbowles
 
Bibliometrics: Now There Are Options
Bibliometrics: Now There Are OptionsBibliometrics: Now There Are Options
Bibliometrics: Now There Are OptionsElaine Lasda
 
Technical writing
Technical writingTechnical writing
Technical writingMANISH T I
 
God's property
God's propertyGod's property
God's propertySoushilove
 

La actualidad más candente (20)

A model for handling overloading of literature review process for social science
A model for handling overloading of literature review process for social scienceA model for handling overloading of literature review process for social science
A model for handling overloading of literature review process for social science
 
Review process 2
Review process 2Review process 2
Review process 2
 
Art of writing research article
Art of writing research articleArt of writing research article
Art of writing research article
 
Review of literature
Review of literatureReview of literature
Review of literature
 
Apa style guide
Apa style guideApa style guide
Apa style guide
 
How to write and publish an article in a reputable international journal
How to write and publish an article in a reputable international journalHow to write and publish an article in a reputable international journal
How to write and publish an article in a reputable international journal
 
Using Citation Analysis to Measure Research Impact
Using Citation Analysis to Measure Research ImpactUsing Citation Analysis to Measure Research Impact
Using Citation Analysis to Measure Research Impact
 
bibliometrics for beginners
bibliometrics for beginnersbibliometrics for beginners
bibliometrics for beginners
 
What really a Research is ?
What really a Research is ?What really a Research is ?
What really a Research is ?
 
Bibliographic coupling
Bibliographic couplingBibliographic coupling
Bibliographic coupling
 
Research Metrics
Research MetricsResearch Metrics
Research Metrics
 
Publishing in high impact factor journals
Publishing in high impact factor journalsPublishing in high impact factor journals
Publishing in high impact factor journals
 
Citation impact: an introduction to bibliometrics for researchers
Citation impact: an introduction to bibliometrics for researchersCitation impact: an introduction to bibliometrics for researchers
Citation impact: an introduction to bibliometrics for researchers
 
Research project guidelines by nmims
Research project guidelines by nmimsResearch project guidelines by nmims
Research project guidelines by nmims
 
Syllabus final
Syllabus finalSyllabus final
Syllabus final
 
Types of Articles
Types of ArticlesTypes of Articles
Types of Articles
 
Bibliometrics: Now There Are Options
Bibliometrics: Now There Are OptionsBibliometrics: Now There Are Options
Bibliometrics: Now There Are Options
 
Technical writing
Technical writingTechnical writing
Technical writing
 
God's property
God's propertyGod's property
God's property
 
Conquerors edol500 i-assignment4.1.doc.
Conquerors edol500 i-assignment4.1.doc.Conquerors edol500 i-assignment4.1.doc.
Conquerors edol500 i-assignment4.1.doc.
 

Similar a Measuring Academic Influence Not All Citations Equal

Publish or Perish: Towards a Ranking of Scientists using Bibliographic Data ...
Publish or Perish:  Towards a Ranking of Scientists using Bibliographic Data ...Publish or Perish:  Towards a Ranking of Scientists using Bibliographic Data ...
Publish or Perish: Towards a Ranking of Scientists using Bibliographic Data ...Lior Rokach
 
Sci期刊影响因子分析(北京)
Sci期刊影响因子分析(北京)Sci期刊影响因子分析(北京)
Sci期刊影响因子分析(北京)viv
 
Scholarly impact metrics traditions
Scholarly impact metrics traditionsScholarly impact metrics traditions
Scholarly impact metrics traditionsntunmg
 
Analysing Author Name Mentions In Citation Contexts Of Highly Cited Publications
Analysing Author Name Mentions In Citation Contexts Of Highly Cited PublicationsAnalysing Author Name Mentions In Citation Contexts Of Highly Cited Publications
Analysing Author Name Mentions In Citation Contexts Of Highly Cited PublicationsTye Rausch
 
Indexing and Citations Metrics: your guide for prospective research
Indexing and Citations Metrics: your guide for prospective researchIndexing and Citations Metrics: your guide for prospective research
Indexing and Citations Metrics: your guide for prospective researchMostafa Nadeer Al-Emran
 
term paper presentation (1) (1).pptx
term paper presentation (1) (1).pptxterm paper presentation (1) (1).pptx
term paper presentation (1) (1).pptxicchapipesh
 
Durham Leading Research Module 13 (Bibliometrics and Altmetrics)
Durham Leading Research Module 13 (Bibliometrics and Altmetrics)Durham Leading Research Module 13 (Bibliometrics and Altmetrics)
Durham Leading Research Module 13 (Bibliometrics and Altmetrics)Jamie Bisset
 
Mastering Academic Impact.pptx
Mastering Academic Impact.pptxMastering Academic Impact.pptx
Mastering Academic Impact.pptxThimmasettyJ
 
How to write a scientific paperGuidelines for the extra cred
How to write a scientific paperGuidelines for the extra credHow to write a scientific paperGuidelines for the extra cred
How to write a scientific paperGuidelines for the extra credalfredai53p
 
How to write a scientific paperGuidelines for the extra
How to write a scientific paperGuidelines for the extra How to write a scientific paperGuidelines for the extra
How to write a scientific paperGuidelines for the extra alfredai53p
 
INSTRUCTIONS FOR THE PREPARATION OF A TECHNICAL ESSAY .docx
INSTRUCTIONS FOR THE PREPARATION OF A TECHNICAL ESSAY  .docxINSTRUCTIONS FOR THE PREPARATION OF A TECHNICAL ESSAY  .docx
INSTRUCTIONS FOR THE PREPARATION OF A TECHNICAL ESSAY .docxdirkrplav
 
Review of related literature
Review of related literatureReview of related literature
Review of related literatureBean Malicse
 
Journal Metrics: The Impact Factor and Everything Else
Journal Metrics: The Impact Factor and Everything ElseJournal Metrics: The Impact Factor and Everything Else
Journal Metrics: The Impact Factor and Everything ElseWiley-Blackwell Compass
 
Guidelines review article
Guidelines review articleGuidelines review article
Guidelines review articlePreethiT4
 
How to prepare a research paper and its evaluation tools
How to prepare a research paper and its evaluation toolsHow to prepare a research paper and its evaluation tools
How to prepare a research paper and its evaluation toolsMohanapriya Suresh
 
Bibliograpgy5
Bibliograpgy5Bibliograpgy5
Bibliograpgy5Rajani17
 

Similar a Measuring Academic Influence Not All Citations Equal (20)

Publish or Perish: Towards a Ranking of Scientists using Bibliographic Data ...
Publish or Perish:  Towards a Ranking of Scientists using Bibliographic Data ...Publish or Perish:  Towards a Ranking of Scientists using Bibliographic Data ...
Publish or Perish: Towards a Ranking of Scientists using Bibliographic Data ...
 
Sci期刊影响因子分析(北京)
Sci期刊影响因子分析(北京)Sci期刊影响因子分析(北京)
Sci期刊影响因子分析(北京)
 
Scholarly impact metrics traditions
Scholarly impact metrics traditionsScholarly impact metrics traditions
Scholarly impact metrics traditions
 
Analysing Author Name Mentions In Citation Contexts Of Highly Cited Publications
Analysing Author Name Mentions In Citation Contexts Of Highly Cited PublicationsAnalysing Author Name Mentions In Citation Contexts Of Highly Cited Publications
Analysing Author Name Mentions In Citation Contexts Of Highly Cited Publications
 
Indexing and Citations Metrics: your guide for prospective research
Indexing and Citations Metrics: your guide for prospective researchIndexing and Citations Metrics: your guide for prospective research
Indexing and Citations Metrics: your guide for prospective research
 
APA Style manual
APA Style manualAPA Style manual
APA Style manual
 
term paper presentation (1) (1).pptx
term paper presentation (1) (1).pptxterm paper presentation (1) (1).pptx
term paper presentation (1) (1).pptx
 
Durham Leading Research Module 13 (Bibliometrics and Altmetrics)
Durham Leading Research Module 13 (Bibliometrics and Altmetrics)Durham Leading Research Module 13 (Bibliometrics and Altmetrics)
Durham Leading Research Module 13 (Bibliometrics and Altmetrics)
 
Mastering Academic Impact.pptx
Mastering Academic Impact.pptxMastering Academic Impact.pptx
Mastering Academic Impact.pptx
 
How to write a scientific paperGuidelines for the extra cred
How to write a scientific paperGuidelines for the extra credHow to write a scientific paperGuidelines for the extra cred
How to write a scientific paperGuidelines for the extra cred
 
How to write a scientific paperGuidelines for the extra
How to write a scientific paperGuidelines for the extra How to write a scientific paperGuidelines for the extra
How to write a scientific paperGuidelines for the extra
 
INSTRUCTIONS FOR THE PREPARATION OF A TECHNICAL ESSAY .docx
INSTRUCTIONS FOR THE PREPARATION OF A TECHNICAL ESSAY  .docxINSTRUCTIONS FOR THE PREPARATION OF A TECHNICAL ESSAY  .docx
INSTRUCTIONS FOR THE PREPARATION OF A TECHNICAL ESSAY .docx
 
Review of related literature
Review of related literatureReview of related literature
Review of related literature
 
Guide for authors
Guide for authorsGuide for authors
Guide for authors
 
Journal Metrics: The Impact Factor and Everything Else
Journal Metrics: The Impact Factor and Everything ElseJournal Metrics: The Impact Factor and Everything Else
Journal Metrics: The Impact Factor and Everything Else
 
Guidelines review article
Guidelines review articleGuidelines review article
Guidelines review article
 
Guidelines review article
Guidelines review articleGuidelines review article
Guidelines review article
 
impact factor ,h index (1).pptx
impact factor ,h index (1).pptximpact factor ,h index (1).pptx
impact factor ,h index (1).pptx
 
How to prepare a research paper and its evaluation tools
How to prepare a research paper and its evaluation toolsHow to prepare a research paper and its evaluation tools
How to prepare a research paper and its evaluation tools
 
Bibliograpgy5
Bibliograpgy5Bibliograpgy5
Bibliograpgy5
 

Más de Andre Vellino

Why machines can't think (logically)
Why machines can't think (logically)Why machines can't think (logically)
Why machines can't think (logically)Andre Vellino
 
Vellino presentationtocisti
Vellino presentationtocistiVellino presentationtocisti
Vellino presentationtocistiAndre Vellino
 
Usage-Based vs. Citation-Based Recommenders in a Digital Library
Usage-Based vs. Citation-Based Recommenders in a Digital LibraryUsage-Based vs. Citation-Based Recommenders in a Digital Library
Usage-Based vs. Citation-Based Recommenders in a Digital LibraryAndre Vellino
 
Mechanical Librarian
Mechanical LibrarianMechanical Librarian
Mechanical LibrarianAndre Vellino
 
La recommandation d'articles scientifiques dans une bibliothèque numérique
La recommandation d'articles scientifiques dans une bibliothèque numériqueLa recommandation d'articles scientifiques dans une bibliothèque numérique
La recommandation d'articles scientifiques dans une bibliothèque numériqueAndre Vellino
 
Synthese Recommender System
Synthese Recommender SystemSynthese Recommender System
Synthese Recommender SystemAndre Vellino
 

Más de Andre Vellino (6)

Why machines can't think (logically)
Why machines can't think (logically)Why machines can't think (logically)
Why machines can't think (logically)
 
Vellino presentationtocisti
Vellino presentationtocistiVellino presentationtocisti
Vellino presentationtocisti
 
Usage-Based vs. Citation-Based Recommenders in a Digital Library
Usage-Based vs. Citation-Based Recommenders in a Digital LibraryUsage-Based vs. Citation-Based Recommenders in a Digital Library
Usage-Based vs. Citation-Based Recommenders in a Digital Library
 
Mechanical Librarian
Mechanical LibrarianMechanical Librarian
Mechanical Librarian
 
La recommandation d'articles scientifiques dans une bibliothèque numérique
La recommandation d'articles scientifiques dans une bibliothèque numériqueLa recommandation d'articles scientifiques dans une bibliothèque numérique
La recommandation d'articles scientifiques dans une bibliothèque numérique
 
Synthese Recommender System
Synthese Recommender SystemSynthese Recommender System
Synthese Recommender System
 

Último

Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...itnewsafrica
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Nikki Chapple
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 

Último (20)

Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 

Measuring Academic Influence Not All Citations Equal

  • 1. Xiaodan Zhu and PeterTurney National Research Council Canada Daniel Lemire TELUQ, Université du Québec Montréal AndreVellino School of Information Studies, University of Ottawa, Ottawa Measuring Academic Influence: Not All Citations Are Equal
  • 2. Overview —  Some background in CitationAnalysis —  What we tried to do and why —  How we did it —  What the results were —  What the implications are
  • 3. What is Citation Analysis Citation analysis refers to the collection of methods for measuring the importance of scholars, journals and institutions by counting citations in a graph of references in the published literature. … … … … … …
  • 4. Why Do Citation Analysis? —  Reason # 1: Because it generates measurable quantities! “Since we can’t really measure what interests us, we begin to be interested in what we can measure” JoelWestheimer Professor of Education University of Ottawa
  • 5. Uses for Citation Measures —  For Readers —  To evaluate the quality of articles / journals —  For Universities —  To evaluate the productivity of academics —  To help in tenure and promotion decisions —  For Journals —  To attract authors to publish —  For Libraries —  To make collections / acquisition decisions —  To make automated recommendations to users
  • 6. How Are Citations Counted? —  Add 1 for every new occurrence of a cited article —  Sum the results —  Average per article & / or CountTotal # of citations Problems —  Self citations! —  No measure of quality of citing source —  May be skewed by a small number of highly cited items —  Easy to “game” by tricking Google Scholar —  viz. Ike Inktare h-index = 94 – Einstein h-index = 84
  • 7. h-index —  Jorge Hirsch (PNAS, 2005) defined the h-index: —  Attempts to measure both the productivity and impact of the author’s published work —  An author has index h if h of their N papers have at least h citations each, and the other (N − h) papers have at most h citations each.
  • 8. Some Criticisms of the h-index —  The h-index does not account for the number of authors or the order of the authors of a paper. —  Cannot use the h-index to compare authors in different fields —  Young researchers with as yet short careers are at a built-in disadvantage over older researchers —  Constrained by the total number of publications —  10 papers each w/ 100 citations each = 10 papers w/ 10 citation each “[h-index] captures a small amount of information about the distribution of a scientist's citations [and] loses crucial information that is essential for the assessment of research.”  Adler, R., Ewing, J.Taylor, P. Citation statistics. A report from the International Mathematical Union. http://www.mathunion.org/fileadmin/IMU/Report/CitationStatistics.pdf
  • 9. Journal Impact factor (IF) —  Invented by Eugene Garfield in 1955 to identify journals for Science Citation Index —  Definition: Total Citations (2 preceding years ) Total Articles (2 preceding years ) =JIF i.e. the impact factor of a journal is the average number of citations to those papers that were published during the two preceding years ¨  e.g. the number of times articles published in 2001 and 2002 were cited by indexed journals during 2003 / the total number of items published in 2001 and 2002
  • 10. Some Criticisms of Impact Factor —  Letters or editorials in some journals (e.g. Nature) are often cited (and counted) in “Total Citations” (numerator) but not in “Total Articles” —  2-year window not applicable in many fields (e.g. in Math 90% of citations fall outside the 2-year window) —  IF varies considerably across disciplines (Math has an average of 0.9 citation per article, Life Sciences have an average of 6.2) “Using the impact factor alone to judge a journal is like using weight alone to judge a person's health.”  Adler, R., Ewing, J.Taylor, P. Citation statistics. A report from the International Mathematical Union. http://www.mathunion.org/fileadmin/IMU/Report/CitationStatistics.pdf
  • 11. What We Did and Why
  • 12. —  As early as 1965 Garfield identified 15 different reasons for citing —  giving credit for related work —  correcting a work —  criticizing previous work —  Many attempts since to categorize citations One Big Assumption All citations should count equally!
  • 13. Citation Typing Ontology (CiTO) Here are first 21 of the 91 citation types in CiTO http://imageweb.zoo.ox.ac.uk/pub/2008/plospaper/latest/#refs Example of semantically annotated article using CiTO:
  • 14. Our Objective —  Solve a binary classification problem: Given a Paper-Reference (P-R) pair, does P-R belong to the class “R is highly influential for P” or not. Our Method —  Apply Machine Learning methods to train a computer to recognize “Highly Influential Reference” from examples
  • 15. Step 1 – Data Collection We believe that most papers are based on 1, 2, 3 or 4 essential references. By an essential reference, we mean a reference that was highly influential or inspirational for the core ideas in your paper; that is, a reference that inspired or strongly influenced your new algorithm, your experimental design, or your choice of a research problem. Other references merely support the work.
  • 16. We asked for —  Title of your paper (research papers only; no surveys) —  The essential references does your paper build? We got —  100 papers —  322 “influential” references —  i.e. 3.2 “influential references” per article —  Each paper —  Contained ~ 31 references in the References section —  Cited ~ 54 references in the body of the paper —  i.e. each reverence was cited an average of 1.7 times per paper
  • 17. The Problem —  The 100 papers yield 3143 paper-reference pairs —  The authors have selected ~320 paper-reference pairs —  Algorithmically: to accurately select those 320 from the 3142
  • 18. Paper – Reference Analysis —  OpenNLP used to detect sentence boundaries and tokenize. —  ParsCit to parse the papers. —  ParsCit is an open-source package for parsing references and document structure in scientific papers. —  Regular expressions to capture citation occurrences in paper bodies that were not detected by ParsCit.
  • 20. We Looked at 5 Classes of Features 1.  Count-based features 2.  Similarity-based features 3.  Context-based features 4.  Position-based features 5.  Miscellaneous features
  • 21. Count Based Features —  Total number of times a paper is referenced in the citing paper —  The number of different sections in which a given reference appears —  Number of times a paper is referenced in the —  “Related” section —  “Introduction” section —  “Core” sections (all sections excluding “Related”,“Introduction”, “Acknowledgements”,“Conclusion” and “FutureWork” —  The number of different sections in which a reference appears
  • 22. Content-Similarity Based Features Citing article Referenced articles Title-Title Title-Abstract Title-Conclusion Title-Introduction Title-Core
  • 23. Citing Context —  When an article is cited, the linguistic context in which the article is cited is considered as saying something about the cited article. e.g. “Like Moravcsik and Murugesan (1975),we are concerned about the side effects of counting insignificant references”
  • 24. Context-Similarity Based Features CitingArticle Title Abstract Introduction Conclusion
  • 25. Other Context Based Features —  Authors explicitly mentioned in citation context? —  Citation alone [4] or with others [3,4,5] —  If “with others” is it first? (e.g.“[3]” is first in “[3,4,5]”) Using pre-defined word-lists, is the lexical content of a citation —  “relevant” [likewise,influential,inspiring useful….] —  “new” [recently,latest,current,improved…] —  “extreme” [greatly,intensely,acutely,almighty,awfully] —  “comparative” [easy,easier,easiest,strong,stronger…]
  • 26. Lexical Context Features Using a lexicon of 114,271 words obtained from the General Inquirer Lexicon (11,788 words) extended w/Wordnet + Turney and LittmanAlgorithm, —  Count the number of words labeled —  “Strong” —  “Positive” —  “Evaluative” Also, sentiment analysis with a different lexicon gave us —  Presence / absence of “Emotion” (Joy, Sadness,Anger, Fear, etc.) —  “Positive” / “Negative”
  • 27. Position Based Features Where does the citation occur? —  Citation appears at the beginning of a sentence? (Y/ N) —  Citation appears at the end of a sentence? (Y/N) —  Where are the sentence(s) in which the citation(s) occur(s) e.g. —  0 (First sentence) to 1 (Last sentence) —  distance from the mean of occurrences of all citations
  • 28. Count Based Features Similarity Based Features Context Based Features Position Based Features Misc. Features
  • 29. Top 7 Features: 4 “counts”, 3 “similarity” Counts in Paper Counts in Sections Counts in Core Section Title-Abstract Similarity Counts in Intro Section Title-Core Similarity Title-Intro Similarity
  • 30. Conventional Measures on Citation Graph … … … … … … C R 1
  • 31. Influence Primed Measures … … … … … … C RX where X = (number of times C cites R)2
  • 32. hip-index —  Each occurrence of a citation of paper R by paper C = 1 —  hip-index (h-influence-primed) index for an author is the largest number h such that at least h of the author's papers have an influence-primed citation count of at least h.
  • 33. Examples hip-index = 5 h-index = 2 cited 3 times by C1 = 9 cited 2 times by C2 = 4 cited 2 times by C3 = 4 cited 2 times by C4 = 4 R3 – cited 3 times by C5 = 9 R4 – cited 3 times by C6 = 9 R5 – cited 3 times by C7 = 9 R6 – cited 2 times by C8 = 4 R7 – cited 1 times by C9 = 1 13 8 9 9 9 4 1 hip-index = 3 h-index = 2 cited 2 times by C1 = 4 cited 1 times by C2 = 1 cited 2 times by C3 = 4 cited 1 times by C4 = 1 R3 – cited 2 times by C5 = 4 R4 – cited 1 times by C6 = 1 R5 – cited 1 times by C7 = 1 R6 – cited 1 times by C8 = 1 R7 – cited 1 times by C9 = 1 5 5 4 1 1 1 1 R1 R2 R1 R2
  • 34. Using hip-index to Predict ACM Fellows —  Used the citation network constructed from —  ~ 20,000 papers in theAssociation for Computational Linguistics Anthology —  Calculated the h-index ofACL Fellows —  Calculated the hip-index ofACL Fellows —  Compared the precision of h-index and hip-index —  the number ofACL Fellows in the top N divided by N
  • 36. Conclusions —  We can throw away h-index and Impact Factor etc. completely OR we can try to improve them by counting citations more relevantly —  A measure of academic influence for a citation is possible and —  It is easy to compute to a first approximation – merely count their frequency —  Apply the influence-primed weights on citation graphs to compute —  Influence-primed Impact Factor, g-index etc.