The Codex of Business Writing Software for Real-World Solutions 2.pptx
Krichel·A Collaboration Graph for E-LIS
1. A collaboration graph for E-LIS
Thomas Krichel
Long Island University & Novosibirsk State
University & Open Library Society
3 November 2011
2. Introduction
• Thanks
– Ángel Sánchez Villegas for usage of the e-lis
domain.
– To Tomas Baiget, who has encouraged me to
present here.
• Warnings
– Data shown here were correct as of 1 November
2011.
– I am glossing over some technical details.
– Over 30 slides
3. overview
• Introduction to AuthorClaim
• Introduction to a co-authorship network
based on restricting AuthorClaim to E-LIS
documents
• Web interface and campaign
4. a known problem
• In publishing systems such as E-LIS, the
authors are usually entered by name.
• It is well known that the name of an author
does not identify a author
– multiple ways to express the name of the same
person
– multiple people sharing one expression of their
names
5. a tried solution
• One way to partially solve this problem is to
have a system where authors can
– claim papers that they have written
– disclaim papers written by their homonyms
• The first system of this kind was the RePEc
Author Service
– created by Thomas Krichel in 1999
– now has registered over 30000 economists
6. AuthorClaim
• AuthorClaim is an interdisciplinary version of
the RePEc Author Service.
• It was created by Thomas Krichel in 2008.
• Lives at http://authorclaim.org.
• Over 100000000 authorships of over
35000000 documents can be claimed.
• Among the documents are the E-LIS papers.
7. 445 E-LIS papers claimed …
• 72 Tomas Baiget
• 61 Ulrich Herb
• 43 Antonella De Robbio
• 39 Thomas Krichel
• 26 Andrea Marchitelli & fernanda peset,
• 20 Ross MacIntyre
• 16 Dirk Lewandowski
• 15 Bożena Bednarek-Michalska
• 14 Lidia Derfert-Wolf
• 11 Zeno Tajoli & Imma Subirats
8. by 36 authors
• 9 Derek Law & Emma McCulloch & Philipp Mayr
• 8 Jeffrey Beall
• 7 nuria Lloret Lloret Romero
• 6 Benjamin John Keele
• 5 Adrian Pohl & Maria Francisca Abad-Garcia
• 4 Walther Umstaetter
• 3 Andrea Scharnhorst & Jose Manuel Barrueco &
Thomas Hapke & Christian Hauschke & Klaus Graf
• 2 Frank Havemann & Eberhard R. Hilf & Bhojaraju
Gunjal & Chris L. Awre
• 1 Loet Leydesdorff & Peter Bolles Hirtle & Alexei
Botchkarev & Christina K. Pikas & Oliver Flimm &
Sridhar Gutam
9. so far so good
• I don’t really want to talk about AuthorClaim
but about a services that we can build when
we have identified authors.
• When we have this data, we can find out who
has been writing papers with whom.
• In other words we can study the co-authorship
network.
10. co-authorship
• When two registered author claim to have
authored the same paper, we say that they are
co-authors.
• The authorship relationship creates a link
between the two authors.
• The link is symmetric, meaning that the fact
that Thomas is a co-author of Imma means
that Imma is a co-author of Thomas.
11. 58 papers have been co-claimed …
• 12 fernanda peset
• 10 Tomas Baiget
• 8 Imma Subirats
• 6 Antonella De Robbio
• 4 nuria Lloret Lloret Romero
12. by 16 co-authors
• 2 Andrea Marchitelli & Ulrich Herb & Ross
MacIntyre & Bożena Bednarek-Michalska &
Thomas Krichel & Dirk Lewandowski & Lidia
Derfert-Wolf
• 1 Derek Law & Emma McCulloch & Sridhar
Gutam & Philipp Mayr
13. network and components
• When we start with one co-author, and we
move to her co-authors, what other authors
can be reach?
• We call the authors we can reach by starting
from any one of them by following co-
authorship relationships a component of the
network.
14. components in the network
• “Scottish”: Derek Law & Emma McCulloch
• “Polish”: Bożena Bednarek-Michalska & Lidia
Derfert-Wolf
• “German”: Dirk Lewandowski & Sridhar Gutam
& Philipp Mayr
• “Giant”: Andrea Marchitelli & Ulrich Herb &
Thomas Krichel & Antonella De Robbio &
fernanda peset & Imma Subirats & Ross
MacIntyre & nuria Lloret Lloret Romero &
Tomas Baiget
15. the giant component
• The size of the giant component is larger than
the combined size of all other component.
• It is very common, in real existing networks,
that there is a giant component.
• As the network grows, older small
components join the giant component and
new small components are created.
• We therefore study the giant component.
16. centrality
• Who is at the center of the E-LIS author
network, i.e. the most central author in E-LIS?
• The answer is that it depends on how we
measure centrality.
• Two measures are commonly used
– closeness centrality
– betweenness centrality
• Both depend on a measure of distance
17. distance
• To understand that we need a measure of
distance.
– We say that two authors have distance one if they
are co-authors.
– We say that two authors have distance two if they
are not co-authors, but have a common co-author.
– etc
18. distances for Imma Subirats
• Tomas Baiget 1
• Antonella De Robbio 1
• Ulrich Herb 2
• Thomas Krichel 1
• nuria Lloret Lloret Romero 2
• Andrea Marchitelli 2
• Ross MacIntyre 2
• fernanda peset 1
• Imma Subirats 0
19. distances for Ulrich Herb
• Tomas Baiget 1
• Antonella De Robbio 3
• Ulrich Herb 0
• Thomas Krichel 2
• nuria Lloret Lloret Romero 3
• Andrea Marchitelli 4
• Ross MacIntyre 4
• fernanda peset 2
• Imma Subirats 2
20. closeness centrality
• The average distance of Imma is much small
than the average distance of Ulrich.
• In fact, we can calculated to average distance
of the every author from all other authors.
• This is what we call closeness centrality of an
author.
21. shortest paths
• In order to find the distance between two
authors, we have to evaluate all possible paths
between them.
• We need to find shortest paths between.
There are well-known algorithms to find them.
• The distance is the length of the shortest path.
22. diameter
• When we have found all shortest paths, we
can find the length of the longest shortest
paths between any two authors.
• This is called the diameter.
• In our network the diameter is four.
• This much smaller than the number of authors
in the giant component (16).
• We say that our network has the “small
world” property.
23. shortest paths from Tomas Baiget
• → Thomas Krichel
• → fernanda peset → nuria Lloret Lloret Romero
• → fernanda peset
• → Imma Subirats → Antonella De Robbio → Ross
MacIntyre
• → Ulrich Herb
• → Imma Subirats → Antonella De Robbio
• → Imma Subirats → Antonella De Robbio → Andrea
Marchitelli
• → Imma Subirats
24. shortest paths from Antonella De Robbio
• → Imma Subirats → fernanda peset → nuria Lloret
Lloret Romero
• → Imma Subirats
• → Imma Subirats → Tomas Baiget → Ulrich Herb
• → Imma Subirats → Tomas Baiget
• → Imma Subirats → fernanda peset
• → Andrea Marchitelli
• → Ross MacIntyre
• → Thomas Krichel
25. shortest paths from Ross MacIntyre
• → Antonella De Robbio → Imma Subirats →
fernanda peset → nuria Lloret Lloret Romero
• → Antonella De Robbio → Imma Subirats →
fernanda peset
• → Antonella De Robbio → Imma Subirats → Tomas
Baiget → Ulrich Herb
• → Antonella De Robbio → Thomas Krichel
• → Antonella De Robbio → Imma Subirats → Tomas
Baiget
• → Antonella De Robbio → Imma Subirats
• → Antonella De Robbio
• → Andrea Marchitelli
26. what do the paths tell us?
• We find that some authors are appearing more
often as intermediaries than other authors.
• In fact, we can evaluate the number of times
an author appears as an intermediary in the
paths.
• This is what we call the betweenness centrality
of an author.
• A large number of authors have a
betweenness of zero. They are called marginal
authors.
27. summary
• We build a network.
• We find two ways to evaluate authors
– closeness
– betweenness
• Now let us look at the results.
28. ranking for closeness
rank name closeness
• 1 Imma Subirats 1.5
• 2 Antonella De Robbio 1.75
• 2 Tomas Baiget 1.75
• 2 Thomas Krichel 1.75
• 5 fernanda peset 1.875
• 6 Andrea Marchitelli 2.5
• 6 Ross MacIntyre 2.5
• 8 Ulrich Herb 2.625
• 9 nuria Lloret Lloret Romero 2.75
29. ranking for betweenness
rank name betweenness
• 1 Antonella De Robbio 2.7
• 1 Imma Subirats 2.7
• 3 Tomas Baiget 2.025
• 4 fernanda peset 1.575
• Andrea Marchitelli, Ross MacIntyre, nuria
Lloret Lloret Romero, Thomas Krichel, Ulrich
Herb are all marginal.
30. web service
• E-LIS and AuthorClaim data are readily
available in bulk.
• There is a software called icanis, developed by
yours truly, that can calculate and visualize
results. It is configurable via XSLT.
• Almost instantaneous updates are in principle
possible, but not implemented.
31. coll.e-lis.org
• This is a site that I have set up.
• I think we need a site in the rclis domain but I
am not sure what the name should be.
• coll.e-lis.org is a bad name too.
• So this is meant as a prototype.
32. features
• Rankings for closeness.
• Full path searching from author pages
– with support for partial name entry
– but within there no highlighting for parts
• Unclear documentation
33. ranking
• Ranking is the way forward with populating
scholarly communication services. RePEc has
shown this time and again.
• Co-authorship ranking is particularly
interesting because authors have to convince
their co-authors to publish papers in E-LIS and
to claim them in AuthorClaim.
34. campaign
• We need to do some work on the site.
• Then we can have campaign and award a cash
prize.
• I am thinking about donating $200 to the top
of each category or $300 to joint winner.
• The competition would be time-limited, say
about three months next Summer.
• During that time we would do frequent
updates of the site.
35. Thank you for your attention!
http://openlib.org/home/krichel
write to krichel@openlib.org