Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
Simple Yet Effective Methods for Large-Scale
Scholarly Publication Ranking
KMi and Mendeley (team BletchleyPark) at WSDM Cu...
Our approach
• Hypothesis
• the importance of a publication can be determined by a
mixture of factors evidencing its impac...
Our approach
• Method
1 separately score each of the types of entities in the graph
2 use the separate scores to provide a...
Publication-based scoring functions
score(p) = 2.5 · spub + 0.1 · sage + 1.0 · spr +
1.0 · sauth + 0.1 · svenue + 0.01 · s...
Publication-based scoring functions
• Scoring publication entities directly (without considering the
importance of authors...
Publication-based scoring functions
• To account for publication age we added a score based on age:
sage(p) = yp (3)
• In ...
Author-based score
score(p) =2.5 · spub + 0.1 · sage + 1.0 · spr+
1.0 · sauth +0.1 · svenue + 0.01 · sinst
(5)
7 / 17
Author-based score
• We’ve experimented with some commonly used methods for
evaluating author performance (number of citat...
Venue-based score
score(p) =2.5 · spub + 0.1 · sage + 1.0 · spr+
1.0 · sauth + 0.1 · svenue + 0.01 · sinst
(7)
9 / 17
Venue-based score
• Standard metric in this area is the JIF, alternatives – Scimago
Journal Rank, Eigenfactor
• We have ex...
Institution-based score
score(p) =2.5 · spub + 0.1 · sage + 1.0 · spr+
1.0 · sauth + 0.1 · svenue + 0.01 · sinst
(9)
11 / ...
Institution-based score
• Simple approach similar to author- and venue-based scores:
sinst(p) =
i∈Ip x∈Pi,x=p c(x)
|Ip|
(1...
Potential improvements
• Better utilisation of the citation network
• Inclusion of additional data sources
• Possibility t...
What have we learned?
• We found simple citation counts to perform best, but (!):
• In order to develop more optimal ranki...
Alternative ranking methods
• We’ve explored several external datasources
• Motivation – utilising new altmetric and webom...
Alternative ranking methods
• Our main interest is in full-text and the set of metrics referred
to as Semantometrics
• Sem...
Thank you for listening!
• Sources
• https://github.com/damirah/wsdm_cup
• Workshop on Mining Scientific Publications
• htt...
Próxima SlideShare
Cargando en…5
×

Simple Yet Effective Methods for Large-Scale Scholarly Publication Ranking

460 visualizaciones

Publicado el

KMi and Mendeley (team BletchleyPark) at WSDM Cup 2016

Publicado en: Ciencias
  • Sé el primero en comentar

  • Sé el primero en recomendar esto

Simple Yet Effective Methods for Large-Scale Scholarly Publication Ranking

  1. 1. Simple Yet Effective Methods for Large-Scale Scholarly Publication Ranking KMi and Mendeley (team BletchleyPark) at WSDM Cup 2016 Drahomira Herrmannova & Petr Knoth The Open University & Mendeley WSDM Cup 2016, February 2016 1 / 17
  2. 2. Our approach • Hypothesis • the importance of a publication can be determined by a mixture of factors evidencing its impact and the importance of entities which participated in the publication’s creation 2 / 17
  3. 3. Our approach • Method 1 separately score each of the types of entities in the graph 2 use the separate scores to provide a publication score 3 we obtain several different scores for the publication entities 4 final score, which defines publication’s rank, is calculated using linear combination of the scores • Weights obtained experimentally • The final equation score(p) =2.5 · spub + 0.1 · sage + 1.0 · spr+ 1.0 · sauth + 0.1 · svenue + 0.01 · sinst (1) 3 / 17
  4. 4. Publication-based scoring functions score(p) = 2.5 · spub + 0.1 · sage + 1.0 · spr + 1.0 · sauth + 0.1 · svenue + 0.01 · sinst 4 / 17
  5. 5. Publication-based scoring functions • Scoring publication entities directly (without considering the importance of authors or venues) • We have experimented with several options of normalising and weighting publication citations • Applying a time decay to citations • Applying a decay function to total citation counts • Using mean citation counts • Final scoring function: spub(p) = c(p)/|Ap|, for c(p) ≤ t t/|Ap|, for c(p) > t (2) 5 / 17
  6. 6. Publication-based scoring functions • To account for publication age we added a score based on age: sage(p) = yp (3) • In the second phase of the challenge we have included PageRank as an additional feature: spr(p) = PR(p) (4) 6 / 17
  7. 7. Author-based score score(p) =2.5 · spub + 0.1 · sage + 1.0 · spr+ 1.0 · sauth +0.1 · svenue + 0.01 · sinst (5) 7 / 17
  8. 8. Author-based score • We’ve experimented with some commonly used methods for evaluating author performance (number of citations, h-index) • We calculated the given value and each of the authors of a publication and tested scoring publications using maximum, total and mean of these values • Final scoring function uses mean citation score per publication and author: sauth(p) = a∈Ap x∈Pa c(x) |Pa| |Ap| (6) 8 / 17
  9. 9. Venue-based score score(p) =2.5 · spub + 0.1 · sage + 1.0 · spr+ 1.0 · sauth + 0.1 · svenue + 0.01 · sinst (7) 9 / 17
  10. 10. Venue-based score • Standard metric in this area is the JIF, alternatives – Scimago Journal Rank, Eigenfactor • We have experimented with few simple scoring functions (JIF, total citation counts, ...) • Final venue-based score: svenue(p) = x∈Pv,x=p c(x) (8) 10 / 17
  11. 11. Institution-based score score(p) =2.5 · spub + 0.1 · sage + 1.0 · spr+ 1.0 · sauth + 0.1 · svenue + 0.01 · sinst (9) 11 / 17
  12. 12. Institution-based score • Simple approach similar to author- and venue-based scores: sinst(p) = i∈Ip x∈Pi,x=p c(x) |Ip| (10) 12 / 17
  13. 13. Potential improvements • Better utilisation of the citation network • Inclusion of additional data sources • Possibility to analyse the evaluation data and metric • Revise the maximum citation threshold used in the spub score 13 / 17
  14. 14. What have we learned? • We found simple citation counts to perform best, but (!): • In order to develop more optimal ranking method, it is crucial to better understand the evaluation data and method • Citation counting does not account for many characteristics of citations (differences in their meaning, popularity of certain topics and types of research papers, ...) 14 / 17
  15. 15. Alternative ranking methods • We’ve explored several external datasources • Motivation – utilising new altmetric and webometric datasources • Early availability of the data compared compared to citations • Broader view of publication’s impact 15 / 17
  16. 16. Alternative ranking methods • Our main interest is in full-text and the set of metrics referred to as Semantometrics • Semantometrics build on the premise the manuscript of the publication is needed to assess its value (in contrast to utilising external data) • Biggest problem – obtaining the full-texts due to copyright restrictions and paywalls • We’re experimenting with enriching the MAG with the publication full-texts • Enriching MAG with altmetric, webometric and semantometric data would enable developing and testing fundamentally new metrics 16 / 17
  17. 17. Thank you for listening! • Sources • https://github.com/damirah/wsdm_cup • Workshop on Mining Scientific Publications • http://wosp.core.ac.uk/jcdl2016/ • Submission deadline – 17th April 17 / 17

×