SlideShare a Scribd company logo
1 of 35
A collaboration graph for E-LIS

             Thomas Krichel
Long Island University & Novosibirsk State
    University & Open Library Society
            3 November 2011
Introduction
• Thanks
  – Ángel Sánchez Villegas for usage of the e-lis
    domain.
  – To Tomas Baiget, who has encouraged me to
    present here.
• Warnings
  – Data shown here were correct as of 1 November
    2011.
  – I am glossing over some technical details.
  – Over 30 slides
overview
• Introduction to AuthorClaim
• Introduction to a co-authorship network
  based on restricting AuthorClaim to E-LIS
  documents
• Web interface and campaign
a known problem
• In publishing systems such as E-LIS, the
  authors are usually entered by name.
• It is well known that the name of an author
  does not identify a author
  – multiple ways to express the name of the same
    person
  – multiple people sharing one expression of their
    names
a tried solution
• One way to partially solve this problem is to
  have a system where authors can
  – claim papers that they have written
  – disclaim papers written by their homonyms
• The first system of this kind was the RePEc
  Author Service
  – created by Thomas Krichel in 1999
  – now has registered over 30000 economists
AuthorClaim
• AuthorClaim is an interdisciplinary version of
  the RePEc Author Service.
• It was created by Thomas Krichel in 2008.
• Lives at http://authorclaim.org.
• Over 100000000 authorships of over
  35000000 documents can be claimed.
• Among the documents are the E-LIS papers.
445 E-LIS papers claimed …
•   72 Tomas Baiget
•   61 Ulrich Herb
•   43 Antonella De Robbio
•   39 Thomas Krichel
•   26 Andrea Marchitelli & fernanda peset,
•   20 Ross MacIntyre
•   16 Dirk Lewandowski
•   15 Bożena Bednarek-Michalska
•   14 Lidia Derfert-Wolf
•   11 Zeno Tajoli & Imma Subirats
by 36 authors
• 9 Derek Law & Emma McCulloch & Philipp Mayr
• 8 Jeffrey Beall
• 7 nuria Lloret Lloret Romero
• 6 Benjamin John Keele
• 5 Adrian Pohl & Maria Francisca Abad-Garcia
• 4 Walther Umstaetter
• 3 Andrea Scharnhorst & Jose Manuel Barrueco &
  Thomas Hapke & Christian Hauschke & Klaus Graf
• 2 Frank Havemann & Eberhard R. Hilf & Bhojaraju
  Gunjal & Chris L. Awre
• 1 Loet Leydesdorff & Peter Bolles Hirtle & Alexei
  Botchkarev & Christina K. Pikas & Oliver Flimm &
  Sridhar Gutam
so far so good
• I don’t really want to talk about AuthorClaim
  but about a services that we can build when
  we have identified authors.
• When we have this data, we can find out who
  has been writing papers with whom.
• In other words we can study the co-authorship
  network.
co-authorship
• When two registered author claim to have
  authored the same paper, we say that they are
  co-authors.
• The authorship relationship creates a link
  between the two authors.
• The link is symmetric, meaning that the fact
  that Thomas is a co-author of Imma means
  that Imma is a co-author of Thomas.
58 papers have been co-claimed …
•   12 fernanda peset
•   10 Tomas Baiget
•   8 Imma Subirats
•   6 Antonella De Robbio
•   4 nuria Lloret Lloret Romero
by 16 co-authors
• 2 Andrea Marchitelli & Ulrich Herb & Ross
  MacIntyre & Bożena Bednarek-Michalska &
  Thomas Krichel & Dirk Lewandowski & Lidia
  Derfert-Wolf
• 1 Derek Law & Emma McCulloch & Sridhar
  Gutam & Philipp Mayr
network and components
• When we start with one co-author, and we
  move to her co-authors, what other authors
  can be reach?
• We call the authors we can reach by starting
  from any one of them by following co-
  authorship relationships a component of the
  network.
components in the network
• “Scottish”: Derek Law & Emma McCulloch
• “Polish”: Bożena Bednarek-Michalska & Lidia
  Derfert-Wolf
• “German”: Dirk Lewandowski & Sridhar Gutam
  & Philipp Mayr
• “Giant”: Andrea Marchitelli & Ulrich Herb &
  Thomas Krichel & Antonella De Robbio &
  fernanda peset & Imma Subirats & Ross
  MacIntyre & nuria Lloret Lloret Romero &
  Tomas Baiget
the giant component
• The size of the giant component is larger than
  the combined size of all other component.
• It is very common, in real existing networks,
  that there is a giant component.
• As the network grows, older small
  components join the giant component and
  new small components are created.
• We therefore study the giant component.
centrality
• Who is at the center of the E-LIS author
  network, i.e. the most central author in E-LIS?
• The answer is that it depends on how we
  measure centrality.
• Two measures are commonly used
  – closeness centrality
  – betweenness centrality
• Both depend on a measure of distance
distance
• To understand that we need a measure of
  distance.
  – We say that two authors have distance one if they
    are co-authors.
  – We say that two authors have distance two if they
    are not co-authors, but have a common co-author.
  – etc
distances for Imma Subirats
•   Tomas Baiget 1
•   Antonella De Robbio 1
•   Ulrich Herb 2
•   Thomas Krichel 1
•   nuria Lloret Lloret Romero 2
•   Andrea Marchitelli 2
•   Ross MacIntyre 2
•   fernanda peset 1
•   Imma Subirats 0
distances for Ulrich Herb
•   Tomas Baiget 1
•   Antonella De Robbio 3
•   Ulrich Herb 0
•   Thomas Krichel 2
•   nuria Lloret Lloret Romero 3
•   Andrea Marchitelli 4
•   Ross MacIntyre 4
•   fernanda peset 2
•   Imma Subirats 2
closeness centrality
• The average distance of Imma is much small
  than the average distance of Ulrich.
• In fact, we can calculated to average distance
  of the every author from all other authors.
• This is what we call closeness centrality of an
  author.
shortest paths
• In order to find the distance between two
  authors, we have to evaluate all possible paths
  between them.
• We need to find shortest paths between.
  There are well-known algorithms to find them.
• The distance is the length of the shortest path.
diameter
• When we have found all shortest paths, we
  can find the length of the longest shortest
  paths between any two authors.
• This is called the diameter.
• In our network the diameter is four.
• This much smaller than the number of authors
  in the giant component (16).
• We say that our network has the “small
  world” property.
shortest paths from Tomas Baiget
•   → Thomas Krichel
•   → fernanda peset → nuria Lloret Lloret Romero
•   → fernanda peset
•   → Imma Subirats → Antonella De Robbio → Ross
    MacIntyre
•   → Ulrich Herb
•   → Imma Subirats → Antonella De Robbio
•   → Imma Subirats → Antonella De Robbio → Andrea
    Marchitelli
•   → Imma Subirats
shortest paths from Antonella De Robbio
• → Imma Subirats → fernanda peset → nuria Lloret
  Lloret Romero
• → Imma Subirats
• → Imma Subirats → Tomas Baiget → Ulrich Herb
• → Imma Subirats → Tomas Baiget
• → Imma Subirats → fernanda peset
• → Andrea Marchitelli
• → Ross MacIntyre
• → Thomas Krichel
shortest paths from Ross MacIntyre
• → Antonella De Robbio → Imma Subirats →
  fernanda peset → nuria Lloret Lloret Romero
• → Antonella De Robbio → Imma Subirats →
  fernanda peset
• → Antonella De Robbio → Imma Subirats → Tomas
  Baiget → Ulrich Herb
• → Antonella De Robbio → Thomas Krichel
• → Antonella De Robbio → Imma Subirats → Tomas
  Baiget
• → Antonella De Robbio → Imma Subirats
• → Antonella De Robbio
• → Andrea Marchitelli
what do the paths tell us?
• We find that some authors are appearing more
  often as intermediaries than other authors.
• In fact, we can evaluate the number of times
  an author appears as an intermediary in the
  paths.
• This is what we call the betweenness centrality
  of an author.
• A large number of authors have a
  betweenness of zero. They are called marginal
  authors.
summary
• We build a network.
• We find two ways to evaluate authors
  – closeness
  – betweenness
• Now let us look at the results.
ranking for closeness
 rank   name                    closeness
• 1     Imma Subirats               1.5
• 2     Antonella De Robbio         1.75
• 2     Tomas Baiget               1.75
• 2     Thomas Krichel              1.75
• 5     fernanda peset              1.875
• 6     Andrea Marchitelli          2.5
• 6     Ross MacIntyre              2.5
• 8     Ulrich Herb                 2.625
• 9     nuria Lloret Lloret Romero 2.75
ranking for betweenness
 rank name                   betweenness
• 1 Antonella De Robbio 2.7
• 1 Imma Subirats             2.7
• 3 Tomas Baiget              2.025
• 4 fernanda peset            1.575
• Andrea Marchitelli, Ross MacIntyre, nuria
   Lloret Lloret Romero, Thomas Krichel, Ulrich
   Herb are all marginal.
web service
• E-LIS and AuthorClaim data are readily
  available in bulk.
• There is a software called icanis, developed by
  yours truly, that can calculate and visualize
  results. It is configurable via XSLT.
• Almost instantaneous updates are in principle
  possible, but not implemented.
coll.e-lis.org
• This is a site that I have set up.
• I think we need a site in the rclis domain but I
  am not sure what the name should be.
• coll.e-lis.org is a bad name too.
• So this is meant as a prototype.
features
• Rankings for closeness.
• Full path searching from author pages
  – with support for partial name entry
  – but within there no highlighting for parts
• Unclear documentation
ranking
• Ranking is the way forward with populating
  scholarly communication services. RePEc has
  shown this time and again.
• Co-authorship ranking is particularly
  interesting because authors have to convince
  their co-authors to publish papers in E-LIS and
  to claim them in AuthorClaim.
campaign
• We need to do some work on the site.
• Then we can have campaign and award a cash
  prize.
• I am thinking about donating $200 to the top
  of each category or $300 to joint winner.
• The competition would be time-limited, say
  about three months next Summer.
• During that time we would do frequent
  updates of the site.
Thank you for your attention!

http://openlib.org/home/krichel

 write to krichel@openlib.org

More Related Content

Similar to Krichel·A Collaboration Graph for E-LIS

Success stories lla 2012
Success stories lla 2012Success stories lla 2012
Success stories lla 2012jacquiekeleher
 
1 the basic concepts of we think
1   the basic concepts of we think1   the basic concepts of we think
1 the basic concepts of we thinkCharis Creber
 
Writing The Research Paper A Handbook (7th ed) - Ch 5 computers and the resea...
Writing The Research Paper A Handbook (7th ed) - Ch 5 computers and the resea...Writing The Research Paper A Handbook (7th ed) - Ch 5 computers and the resea...
Writing The Research Paper A Handbook (7th ed) - Ch 5 computers and the resea...tedster777
 
Write A Better FM - Ohio Linux 2011
Write A Better FM - Ohio Linux 2011Write A Better FM - Ohio Linux 2011
Write A Better FM - Ohio Linux 2011Rich Bowen
 
Solving Problems with Web 2.0
Solving Problems with Web 2.0Solving Problems with Web 2.0
Solving Problems with Web 2.0Dorothea Salo
 
Courage of our Connections
Courage of our ConnectionsCourage of our Connections
Courage of our ConnectionsRachel Frick
 
Contributing to Open Source
Contributing to Open SourceContributing to Open Source
Contributing to Open SourceDaniel Stenberg
 
Presentation: Virtual Pizza is not enough. Mozilla Brownbag March, 2014
Presentation: Virtual Pizza is not enough. Mozilla Brownbag March, 2014Presentation: Virtual Pizza is not enough. Mozilla Brownbag March, 2014
Presentation: Virtual Pizza is not enough. Mozilla Brownbag March, 2014Jennie Rose Halperin
 
Why twitter? What can Twitter do for my library & my professional development?
Why twitter? What can Twitter do for my library & my professional development?Why twitter? What can Twitter do for my library & my professional development?
Why twitter? What can Twitter do for my library & my professional development?Bill Drew
 
2014FreelanceWriting
2014FreelanceWriting2014FreelanceWriting
2014FreelanceWritingFran Molloy
 
Presentation: Virtual Pizza is not enough. Mozilla Brownbag March, 2014
Presentation: Virtual Pizza is not enough. Mozilla Brownbag March, 2014Presentation: Virtual Pizza is not enough. Mozilla Brownbag March, 2014
Presentation: Virtual Pizza is not enough. Mozilla Brownbag March, 2014Jennie Rose Halperin
 
Using online and print resources.pptx
Using online and print resources.pptxUsing online and print resources.pptx
Using online and print resources.pptxLaljiBaraiya1
 
Keeping the Content Train on the Tracks (And on Topic)
Keeping the Content Train on the Tracks (And on Topic)Keeping the Content Train on the Tracks (And on Topic)
Keeping the Content Train on the Tracks (And on Topic)Kristen Eberlein
 
Write a better FM
Write a better FMWrite a better FM
Write a better FMRich Bowen
 
Anon p2p slides
Anon p2p slidesAnon p2p slides
Anon p2p slideschintaan
 
Breaking into the Nonfiction Market, Step-by-Step
Breaking into the Nonfiction Market, Step-by-StepBreaking into the Nonfiction Market, Step-by-Step
Breaking into the Nonfiction Market, Step-by-Stepggaldorisi
 
Social Media Analytics
Social Media AnalyticsSocial Media Analytics
Social Media AnalyticsMuhammad Rifqi
 

Similar to Krichel·A Collaboration Graph for E-LIS (20)

Success stories lla 2012
Success stories lla 2012Success stories lla 2012
Success stories lla 2012
 
1 the basic concepts of we think
1   the basic concepts of we think1   the basic concepts of we think
1 the basic concepts of we think
 
Writing The Research Paper A Handbook (7th ed) - Ch 5 computers and the resea...
Writing The Research Paper A Handbook (7th ed) - Ch 5 computers and the resea...Writing The Research Paper A Handbook (7th ed) - Ch 5 computers and the resea...
Writing The Research Paper A Handbook (7th ed) - Ch 5 computers and the resea...
 
Write A Better FM - Ohio Linux 2011
Write A Better FM - Ohio Linux 2011Write A Better FM - Ohio Linux 2011
Write A Better FM - Ohio Linux 2011
 
Sources
SourcesSources
Sources
 
Solving Problems with Web 2.0
Solving Problems with Web 2.0Solving Problems with Web 2.0
Solving Problems with Web 2.0
 
Courage of our Connections
Courage of our ConnectionsCourage of our Connections
Courage of our Connections
 
Contributing to Open Source
Contributing to Open SourceContributing to Open Source
Contributing to Open Source
 
Presentation: Virtual Pizza is not enough. Mozilla Brownbag March, 2014
Presentation: Virtual Pizza is not enough. Mozilla Brownbag March, 2014Presentation: Virtual Pizza is not enough. Mozilla Brownbag March, 2014
Presentation: Virtual Pizza is not enough. Mozilla Brownbag March, 2014
 
375 cc3 a_lindabeebe
375 cc3 a_lindabeebe375 cc3 a_lindabeebe
375 cc3 a_lindabeebe
 
Why twitter? What can Twitter do for my library & my professional development?
Why twitter? What can Twitter do for my library & my professional development?Why twitter? What can Twitter do for my library & my professional development?
Why twitter? What can Twitter do for my library & my professional development?
 
2014FreelanceWriting
2014FreelanceWriting2014FreelanceWriting
2014FreelanceWriting
 
Presentation: Virtual Pizza is not enough. Mozilla Brownbag March, 2014
Presentation: Virtual Pizza is not enough. Mozilla Brownbag March, 2014Presentation: Virtual Pizza is not enough. Mozilla Brownbag March, 2014
Presentation: Virtual Pizza is not enough. Mozilla Brownbag March, 2014
 
Using online and print resources.pptx
Using online and print resources.pptxUsing online and print resources.pptx
Using online and print resources.pptx
 
Keeping the Content Train on the Tracks (And on Topic)
Keeping the Content Train on the Tracks (And on Topic)Keeping the Content Train on the Tracks (And on Topic)
Keeping the Content Train on the Tracks (And on Topic)
 
Write a better FM
Write a better FMWrite a better FM
Write a better FM
 
Class 6 jrnl 6202
Class 6 jrnl 6202Class 6 jrnl 6202
Class 6 jrnl 6202
 
Anon p2p slides
Anon p2p slidesAnon p2p slides
Anon p2p slides
 
Breaking into the Nonfiction Market, Step-by-Step
Breaking into the Nonfiction Market, Step-by-StepBreaking into the Nonfiction Market, Step-by-Step
Breaking into the Nonfiction Market, Step-by-Step
 
Social Media Analytics
Social Media AnalyticsSocial Media Analytics
Social Media Analytics
 

Recently uploaded

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 

Recently uploaded (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 

Krichel·A Collaboration Graph for E-LIS

  • 1. A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011
  • 2. Introduction • Thanks – Ángel Sánchez Villegas for usage of the e-lis domain. – To Tomas Baiget, who has encouraged me to present here. • Warnings – Data shown here were correct as of 1 November 2011. – I am glossing over some technical details. – Over 30 slides
  • 3. overview • Introduction to AuthorClaim • Introduction to a co-authorship network based on restricting AuthorClaim to E-LIS documents • Web interface and campaign
  • 4. a known problem • In publishing systems such as E-LIS, the authors are usually entered by name. • It is well known that the name of an author does not identify a author – multiple ways to express the name of the same person – multiple people sharing one expression of their names
  • 5. a tried solution • One way to partially solve this problem is to have a system where authors can – claim papers that they have written – disclaim papers written by their homonyms • The first system of this kind was the RePEc Author Service – created by Thomas Krichel in 1999 – now has registered over 30000 economists
  • 6. AuthorClaim • AuthorClaim is an interdisciplinary version of the RePEc Author Service. • It was created by Thomas Krichel in 2008. • Lives at http://authorclaim.org. • Over 100000000 authorships of over 35000000 documents can be claimed. • Among the documents are the E-LIS papers.
  • 7. 445 E-LIS papers claimed … • 72 Tomas Baiget • 61 Ulrich Herb • 43 Antonella De Robbio • 39 Thomas Krichel • 26 Andrea Marchitelli & fernanda peset, • 20 Ross MacIntyre • 16 Dirk Lewandowski • 15 Bożena Bednarek-Michalska • 14 Lidia Derfert-Wolf • 11 Zeno Tajoli & Imma Subirats
  • 8. by 36 authors • 9 Derek Law & Emma McCulloch & Philipp Mayr • 8 Jeffrey Beall • 7 nuria Lloret Lloret Romero • 6 Benjamin John Keele • 5 Adrian Pohl & Maria Francisca Abad-Garcia • 4 Walther Umstaetter • 3 Andrea Scharnhorst & Jose Manuel Barrueco & Thomas Hapke & Christian Hauschke & Klaus Graf • 2 Frank Havemann & Eberhard R. Hilf & Bhojaraju Gunjal & Chris L. Awre • 1 Loet Leydesdorff & Peter Bolles Hirtle & Alexei Botchkarev & Christina K. Pikas & Oliver Flimm & Sridhar Gutam
  • 9. so far so good • I don’t really want to talk about AuthorClaim but about a services that we can build when we have identified authors. • When we have this data, we can find out who has been writing papers with whom. • In other words we can study the co-authorship network.
  • 10. co-authorship • When two registered author claim to have authored the same paper, we say that they are co-authors. • The authorship relationship creates a link between the two authors. • The link is symmetric, meaning that the fact that Thomas is a co-author of Imma means that Imma is a co-author of Thomas.
  • 11. 58 papers have been co-claimed … • 12 fernanda peset • 10 Tomas Baiget • 8 Imma Subirats • 6 Antonella De Robbio • 4 nuria Lloret Lloret Romero
  • 12. by 16 co-authors • 2 Andrea Marchitelli & Ulrich Herb & Ross MacIntyre & Bożena Bednarek-Michalska & Thomas Krichel & Dirk Lewandowski & Lidia Derfert-Wolf • 1 Derek Law & Emma McCulloch & Sridhar Gutam & Philipp Mayr
  • 13. network and components • When we start with one co-author, and we move to her co-authors, what other authors can be reach? • We call the authors we can reach by starting from any one of them by following co- authorship relationships a component of the network.
  • 14. components in the network • “Scottish”: Derek Law & Emma McCulloch • “Polish”: Bożena Bednarek-Michalska & Lidia Derfert-Wolf • “German”: Dirk Lewandowski & Sridhar Gutam & Philipp Mayr • “Giant”: Andrea Marchitelli & Ulrich Herb & Thomas Krichel & Antonella De Robbio & fernanda peset & Imma Subirats & Ross MacIntyre & nuria Lloret Lloret Romero & Tomas Baiget
  • 15. the giant component • The size of the giant component is larger than the combined size of all other component. • It is very common, in real existing networks, that there is a giant component. • As the network grows, older small components join the giant component and new small components are created. • We therefore study the giant component.
  • 16. centrality • Who is at the center of the E-LIS author network, i.e. the most central author in E-LIS? • The answer is that it depends on how we measure centrality. • Two measures are commonly used – closeness centrality – betweenness centrality • Both depend on a measure of distance
  • 17. distance • To understand that we need a measure of distance. – We say that two authors have distance one if they are co-authors. – We say that two authors have distance two if they are not co-authors, but have a common co-author. – etc
  • 18. distances for Imma Subirats • Tomas Baiget 1 • Antonella De Robbio 1 • Ulrich Herb 2 • Thomas Krichel 1 • nuria Lloret Lloret Romero 2 • Andrea Marchitelli 2 • Ross MacIntyre 2 • fernanda peset 1 • Imma Subirats 0
  • 19. distances for Ulrich Herb • Tomas Baiget 1 • Antonella De Robbio 3 • Ulrich Herb 0 • Thomas Krichel 2 • nuria Lloret Lloret Romero 3 • Andrea Marchitelli 4 • Ross MacIntyre 4 • fernanda peset 2 • Imma Subirats 2
  • 20. closeness centrality • The average distance of Imma is much small than the average distance of Ulrich. • In fact, we can calculated to average distance of the every author from all other authors. • This is what we call closeness centrality of an author.
  • 21. shortest paths • In order to find the distance between two authors, we have to evaluate all possible paths between them. • We need to find shortest paths between. There are well-known algorithms to find them. • The distance is the length of the shortest path.
  • 22. diameter • When we have found all shortest paths, we can find the length of the longest shortest paths between any two authors. • This is called the diameter. • In our network the diameter is four. • This much smaller than the number of authors in the giant component (16). • We say that our network has the “small world” property.
  • 23. shortest paths from Tomas Baiget • → Thomas Krichel • → fernanda peset → nuria Lloret Lloret Romero • → fernanda peset • → Imma Subirats → Antonella De Robbio → Ross MacIntyre • → Ulrich Herb • → Imma Subirats → Antonella De Robbio • → Imma Subirats → Antonella De Robbio → Andrea Marchitelli • → Imma Subirats
  • 24. shortest paths from Antonella De Robbio • → Imma Subirats → fernanda peset → nuria Lloret Lloret Romero • → Imma Subirats • → Imma Subirats → Tomas Baiget → Ulrich Herb • → Imma Subirats → Tomas Baiget • → Imma Subirats → fernanda peset • → Andrea Marchitelli • → Ross MacIntyre • → Thomas Krichel
  • 25. shortest paths from Ross MacIntyre • → Antonella De Robbio → Imma Subirats → fernanda peset → nuria Lloret Lloret Romero • → Antonella De Robbio → Imma Subirats → fernanda peset • → Antonella De Robbio → Imma Subirats → Tomas Baiget → Ulrich Herb • → Antonella De Robbio → Thomas Krichel • → Antonella De Robbio → Imma Subirats → Tomas Baiget • → Antonella De Robbio → Imma Subirats • → Antonella De Robbio • → Andrea Marchitelli
  • 26. what do the paths tell us? • We find that some authors are appearing more often as intermediaries than other authors. • In fact, we can evaluate the number of times an author appears as an intermediary in the paths. • This is what we call the betweenness centrality of an author. • A large number of authors have a betweenness of zero. They are called marginal authors.
  • 27. summary • We build a network. • We find two ways to evaluate authors – closeness – betweenness • Now let us look at the results.
  • 28. ranking for closeness rank name closeness • 1 Imma Subirats 1.5 • 2 Antonella De Robbio 1.75 • 2 Tomas Baiget 1.75 • 2 Thomas Krichel 1.75 • 5 fernanda peset 1.875 • 6 Andrea Marchitelli 2.5 • 6 Ross MacIntyre 2.5 • 8 Ulrich Herb 2.625 • 9 nuria Lloret Lloret Romero 2.75
  • 29. ranking for betweenness rank name betweenness • 1 Antonella De Robbio 2.7 • 1 Imma Subirats 2.7 • 3 Tomas Baiget 2.025 • 4 fernanda peset 1.575 • Andrea Marchitelli, Ross MacIntyre, nuria Lloret Lloret Romero, Thomas Krichel, Ulrich Herb are all marginal.
  • 30. web service • E-LIS and AuthorClaim data are readily available in bulk. • There is a software called icanis, developed by yours truly, that can calculate and visualize results. It is configurable via XSLT. • Almost instantaneous updates are in principle possible, but not implemented.
  • 31. coll.e-lis.org • This is a site that I have set up. • I think we need a site in the rclis domain but I am not sure what the name should be. • coll.e-lis.org is a bad name too. • So this is meant as a prototype.
  • 32. features • Rankings for closeness. • Full path searching from author pages – with support for partial name entry – but within there no highlighting for parts • Unclear documentation
  • 33. ranking • Ranking is the way forward with populating scholarly communication services. RePEc has shown this time and again. • Co-authorship ranking is particularly interesting because authors have to convince their co-authors to publish papers in E-LIS and to claim them in AuthorClaim.
  • 34. campaign • We need to do some work on the site. • Then we can have campaign and award a cash prize. • I am thinking about donating $200 to the top of each category or $300 to joint winner. • The competition would be time-limited, say about three months next Summer. • During that time we would do frequent updates of the site.
  • 35. Thank you for your attention! http://openlib.org/home/krichel write to krichel@openlib.org