Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Simple Yet Effective Methods for Large-Scale Scholarly Publication Ranking

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Cargando en…3
×

Eche un vistazo a continuación

1 de 17 Anuncio

Más Contenido Relacionado

Similares a Simple Yet Effective Methods for Large-Scale Scholarly Publication Ranking (20)

Anuncio

Más de Dasha Herrmannova (10)

Más reciente (20)

Anuncio

Simple Yet Effective Methods for Large-Scale Scholarly Publication Ranking

  1. 1. Simple Yet Effective Methods for Large-Scale Scholarly Publication Ranking KMi and Mendeley (team BletchleyPark) at WSDM Cup 2016 Drahomira Herrmannova & Petr Knoth The Open University & Mendeley WSDM Cup 2016, February 2016 1 / 17
  2. 2. Our approach • Hypothesis • the importance of a publication can be determined by a mixture of factors evidencing its impact and the importance of entities which participated in the publication’s creation 2 / 17
  3. 3. Our approach • Method 1 separately score each of the types of entities in the graph 2 use the separate scores to provide a publication score 3 we obtain several different scores for the publication entities 4 final score, which defines publication’s rank, is calculated using linear combination of the scores • Weights obtained experimentally • The final equation score(p) =2.5 · spub + 0.1 · sage + 1.0 · spr+ 1.0 · sauth + 0.1 · svenue + 0.01 · sinst (1) 3 / 17
  4. 4. Publication-based scoring functions score(p) = 2.5 · spub + 0.1 · sage + 1.0 · spr + 1.0 · sauth + 0.1 · svenue + 0.01 · sinst 4 / 17
  5. 5. Publication-based scoring functions • Scoring publication entities directly (without considering the importance of authors or venues) • We have experimented with several options of normalising and weighting publication citations • Applying a time decay to citations • Applying a decay function to total citation counts • Using mean citation counts • Final scoring function: spub(p) = c(p)/|Ap|, for c(p) ≤ t t/|Ap|, for c(p) > t (2) 5 / 17
  6. 6. Publication-based scoring functions • To account for publication age we added a score based on age: sage(p) = yp (3) • In the second phase of the challenge we have included PageRank as an additional feature: spr(p) = PR(p) (4) 6 / 17
  7. 7. Author-based score score(p) =2.5 · spub + 0.1 · sage + 1.0 · spr+ 1.0 · sauth +0.1 · svenue + 0.01 · sinst (5) 7 / 17
  8. 8. Author-based score • We’ve experimented with some commonly used methods for evaluating author performance (number of citations, h-index) • We calculated the given value and each of the authors of a publication and tested scoring publications using maximum, total and mean of these values • Final scoring function uses mean citation score per publication and author: sauth(p) = a∈Ap x∈Pa c(x) |Pa| |Ap| (6) 8 / 17
  9. 9. Venue-based score score(p) =2.5 · spub + 0.1 · sage + 1.0 · spr+ 1.0 · sauth + 0.1 · svenue + 0.01 · sinst (7) 9 / 17
  10. 10. Venue-based score • Standard metric in this area is the JIF, alternatives – Scimago Journal Rank, Eigenfactor • We have experimented with few simple scoring functions (JIF, total citation counts, ...) • Final venue-based score: svenue(p) = x∈Pv,x=p c(x) (8) 10 / 17
  11. 11. Institution-based score score(p) =2.5 · spub + 0.1 · sage + 1.0 · spr+ 1.0 · sauth + 0.1 · svenue + 0.01 · sinst (9) 11 / 17
  12. 12. Institution-based score • Simple approach similar to author- and venue-based scores: sinst(p) = i∈Ip x∈Pi,x=p c(x) |Ip| (10) 12 / 17
  13. 13. Potential improvements • Better utilisation of the citation network • Inclusion of additional data sources • Possibility to analyse the evaluation data and metric • Revise the maximum citation threshold used in the spub score 13 / 17
  14. 14. What have we learned? • We found simple citation counts to perform best, but (!): • In order to develop more optimal ranking method, it is crucial to better understand the evaluation data and method • Citation counting does not account for many characteristics of citations (differences in their meaning, popularity of certain topics and types of research papers, ...) 14 / 17
  15. 15. Alternative ranking methods • We’ve explored several external datasources • Motivation – utilising new altmetric and webometric datasources • Early availability of the data compared compared to citations • Broader view of publication’s impact 15 / 17
  16. 16. Alternative ranking methods • Our main interest is in full-text and the set of metrics referred to as Semantometrics • Semantometrics build on the premise the manuscript of the publication is needed to assess its value (in contrast to utilising external data) • Biggest problem – obtaining the full-texts due to copyright restrictions and paywalls • We’re experimenting with enriching the MAG with the publication full-texts • Enriching MAG with altmetric, webometric and semantometric data would enable developing and testing fundamentally new metrics 16 / 17
  17. 17. Thank you for listening! • Sources • https://github.com/damirah/wsdm_cup • Workshop on Mining Scientific Publications • http://wosp.core.ac.uk/jcdl2016/ • Submission deadline – 17th April 17 / 17

×