SlideShare una empresa de Scribd logo
1 de 17
Descargar para leer sin conexión
Oxford e-Research Centre
University of Oxford, UK
WikiCite 2017
Vienna, Austria
23 May 2017
© David Shotton and Silvio Peroni, 2017 Published under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 Licence
david.shotton@opencitations.net
David Shotton Silvio Peroni
silvio.peroni@opencitations.net
Dept. Computer Science and Engineering
University of Bologna, Italy
What is a citation?
n  The performative act of citing a published work that is relevant to the current
work, typically made by including a reference in a reference list
Why are citations important?
n  The act of bibliographic citation is central to scholarly communication –
bibliographic references are the links that knit together independent scholarship
n  Citations unify the whole world of scholarship into a giant citation network
n  Citation networks reveal the development of academic disciplines
n  Sir Isaac Newton:
“If I have seen a little further, it is by standing on the shoulders of Giants”
How is the present situation imperfect?
n  The present scholarly citation system inadequately exposes the knowledge
networks that exist within the scholarly literature
n  Citation data are hidden behind subscription firewalls of commercial companies
n  Academics are not free to use their own citation data as they please
n  In this Open Access age, it is a scandal that reference lists from journal articles,
the core elements of the academic data cycle, are not freely available for use by
the scholars who created them
n  Citation data now need to be recognized as a part of the Commons – those
works that are freely and legally available for sharing
n  To address this issue, we have developed The OpenCitations Corpus
How this came about - 2009 adventures in semantic publishing
The SPAR (Semantic Publishing and Referencing) Ontologies
FaBiO, the FRBR-aligned Bibliographic Ontology - an ontology for
describing bibliographic entities (books, articles, etc.)
CiTO, the Citation Typing Ontology - enable characterization of citations,
both factually and rhetorically
BiRO, the Bibliographic Reference Ontology - an ontology to define
bibliographic records and references, and their compilation into bibliographic
collections and reference lists, respectively
C4O, the Citation Counting and Context Characterization Ontology
DoCO, the Document Components Ontology
PRO, the Publishing Roles Ontology
PSO, the Publishing Status Ontology
PWO, the Publishing Workflow . . . and now others
http://www.sparontologies.net/
The OpenCitations Corpus
n  The OpenCitations Corpus is a Linked Open Data repository of scholarly
bibliographic citation data described using the SPAR ontologies
n  Prototype created at Oxford in 2011 by Alex Dutton with JISC funding
n  A new instantiation created by Silvio at the University of Bologna in late 2015
§  based on a revised metadata schema, with automated daily ingestion of
citations from authoritative sources
n  OCC now provides the largest RDF collection of open citation data on the Web
§  currently holds the references from ~150,000 citing bibliographic resources
§  providing ~6.7 million citation links to over 4 million cited resources
n  These citations are encoded using the SPAR ontologies, and are freely available
under a CC0 public domain waiver from http://opencitations.net/
n  The OpenCitations Enhancement Project has just been funded
by the Sloan Foundation, to enhance ingest rates and provide
smart data visualization interfaces
Ingestion workflow
n  We developed several scripts for implementing the ingestion workflow that
populates the OpenCitations Corpus
n  All the software is available on the OpenCitations GitHub repository
https://github.com/essepuntato/opencitations
§  Released as open source code with the ISC License
https://opensource.org/licenses/ISC
n  These scripts implement a live and iterative process
n  Why live?
§  It is working while I’m speaking
§  It does not sleep, never
§  It is like a sentient, relentless, fast zombie – watch out!
n  Why iterative?
The ingestion workflow continuously calls several external APIs to obtain new
reference lists and clean metadata of the citing and cited papers
O C
Reference lists from PubMed Central
n  At present, all the reference lists are taken by processing the XML sources of
the papers in the PubMed Central Open Access subset
n  We use the Europe PubMed Central API for retrieving the XML sources
§  We ask for the most recent papers first
§  Thus, as citing papers, the OCC mainly includes articles published in
2016 and 2017
n  There are 1.58M OA articles available in PubMed, according to their API
http://www.ebi.ac.uk/europepmc/webservices/rest/search?query=open_access:y
§  We have harvested 10% so far . . .
n  The identifiers of all the citing papers that we have been already processed by
the ingestion workflow are stored locally, so as not to request the same XML
source twice
Metadata from Crossref and ORCID
n  The reference lists extracted from citing papers are made available in JSON:
{

"doi": "10.1007/s11892-016-0752-4",

"pmid": "27168063",

"pmcid": "PMC4863913",

"localid": "MED-27168063",

"curator": "BEE EuropeanPubMedCentralProcessor",

"source": "http://www.ebi.ac.uk/europepmc/webservices/rest/PMC4863913/fullTextXML",

"source_provider": "Europe PubMed Central”

"references": [

...

{

"bibentry": "Chang, KY, Unanue, ER. Prediction of HLA-DQ8beta cell peptidome using

a computational program and its relationship to autoreactive T cells,

Int Immunol, 2009, 21, 6, 705, 13, DOI: 10.1093/intimm/dxp039, 

PMID: 19461125",

"pmid": "19461125",

"doi": "10.1093/intimm/dxp039",

"pmcid": "PMC2686615",

"process_entry": "True”

},

...

]

}
n  We then call the Crossref APIs to obtain additional information (title, authors,
venues, etc.) about the citing paper and about those papers described in the
reference list, and then call the ORCID APIs to obtain ORCIDs of the authors
The citing paper's metadata and identifiers
A reference in the citing paper's reference list, with its own ids
The OpenCitations Corpus data model
n  Available at https://doi.org/10.6084/m9.figshare.3443876
n  Implemented in the OpenCitations Ontology (OCO, https://w3id.org/oc/ontology)
§  It is not yet another bibliographic ontology, but rather simply a mechanism
for grouping together existing complementary ontological entities from
several other ontologies (e.g. SPAR and FOAF)
Resources included within the Corpus (as of 26 April 2017)
Entity type What it describes Count in the OCC
Bibliographic resource (br) Conference papers, book chapters, journal articles,
academic proceedings, books, journals, etc.
5.1 million
Resource embodiment (re) Digital vs. print, first and ending pages, etc. 2.9 million
Bibliographic entry (be) Textual content of a reference in a reference list 6 million
Responsible agent (ra) Given name, family name and ORCID of the agent
involved
15.8 million
Agent role (ar) Author, publisher, etc. 20 million
Identifier (id) DOI, PubMed ID, PubMed Central ID, ORCID, ISSN, etc. 10.4 million
OpenCitations in the wild
n  Twitter: https://twitter.com/opencitations
n  Blog: https://opencitations.wordpress.com
n  The data in the OpenCitations Corpus are available in three different ways:
§  Direct access to bibliographic resources by means of their HTTP URIs
(via content negotiation, e.g. https://w3id.org/oc/corpus/br/1)
§  SPARQL endpoint: https://w3id.org/oc/sparql
§  Monthly dumps: http://opencitations.net/download (stored in Figshare)
Figshare statistics as of 8 May 2017
Third-party usage of OpenCitations
n  Projects that use OpenCitations resources:
§  Wikidata
§  OpenAIRE
§  LOC-DB
§  Others? Please let us know!
n  Accesses to the OpenCitations website and services:
The pages relating to the data available (“corpus”) and the service for querying them (“sparql”)
have together gained 88% of the overall accesses, showing that the main reason people access
the OpenCitations website is to explore and use the data in the OpenCitations Corpus
What happened in the past month
n  Use of the OpenCitations social accounts (Twitter, Blog on Wordpress)
increased markedly during the past month
n  What happened?
Initiative for Open Citations (I4OC)
n  The Initiative for Open Citations (I4OC, https://i4oc.org) is a collaboration
between scholarly publishers, researchers, and other interested parties to
promote the unrestricted availability of scholarly citation data
n  Founders:
n  Aim: promote the availability of structured, separable, and open citation data
n  How: asking publishers
§  to submit article metadata (including reference lists) to Crossref Cited-by
service
§  to allow Crossref to open the reference lists to the public
n  Achievement: as of March 2017, publications with open references freely
available in Crossref has grown from 1% to more than 40%
OpenCitations is
one of the
founder
The OpenCitations ingestion rate: an update
About 500,000 new
citations links added
per month
per day
New infrastructure coming soon
(thanks to the OpenCitations Enhancement Project
just funded by the Sloan Foundation)
The OpenCitations will have ~190 million citation links
after one year of processing with the new infrastructure
Thank you for your attention
david.shotton@opencitations.net
David Shotton Silvio Peroni
silvio.peroni@opencitations.net
Website: http://opencitations.net
Email: contact@opencitations.net
Twitter: @opencitations
Blog: https://opencitations.wordpress.com
Github: https://github.com/essepuntato/opencitations
Contacts

Más contenido relacionado

La actualidad más candente

Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for LibrariesLukas Koster
 
How Libraries Use Publisher Metadata Redux (Steven Shadle)
How Libraries Use Publisher Metadata Redux (Steven Shadle)How Libraries Use Publisher Metadata Redux (Steven Shadle)
How Libraries Use Publisher Metadata Redux (Steven Shadle)Charleston Conference
 
Towards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemTowards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemHerbert Van de Sompel
 
Signposting for Repositories
Signposting for RepositoriesSignposting for Repositories
Signposting for RepositoriesMartin Klein
 
Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013Herbert Van de Sompel
 
Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the webChiara Del Vescovo
 
FAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueFAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueHerbert Van de Sompel
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod GmodJun Zhao
 
Interoperability for web based scholarship
Interoperability for web based scholarshipInteroperability for web based scholarship
Interoperability for web based scholarshipHerbert Van de Sompel
 
Web-scale Discovery Implementation with the End User in Mind (SLA 2012)
Web-scale Discovery Implementation with the End User in Mind (SLA 2012)Web-scale Discovery Implementation with the End User in Mind (SLA 2012)
Web-scale Discovery Implementation with the End User in Mind (SLA 2012)Rafal Kasprowski
 
EDS Web-scale Panel (Preprint), 2012 Charleston Conference
EDS Web-scale Panel (Preprint), 2012 Charleston ConferenceEDS Web-scale Panel (Preprint), 2012 Charleston Conference
EDS Web-scale Panel (Preprint), 2012 Charleston ConferenceRafal Kasprowski
 
Environmental trends and OCLC Research, a presentation at the University of N...
Environmental trends and OCLC Research, a presentation at the University of N...Environmental trends and OCLC Research, a presentation at the University of N...
Environmental trends and OCLC Research, a presentation at the University of N...lisld
 
IOTA @ NASIG 2011: Measuring the Quality of OpenURL Links
IOTA @ NASIG 2011: Measuring the Quality of OpenURL LinksIOTA @ NASIG 2011: Measuring the Quality of OpenURL Links
IOTA @ NASIG 2011: Measuring the Quality of OpenURL LinksRafal Kasprowski
 
Linked Data - Radical Change?
Linked Data -  Radical Change?Linked Data -  Radical Change?
Linked Data - Radical Change?Richard Wallis
 
Isni where are we now gatenby harvard 2014 11
Isni where are we now gatenby harvard 2014 11Isni where are we now gatenby harvard 2014 11
Isni where are we now gatenby harvard 2014 11Janifer Gatenby
 

La actualidad más candente (20)

Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for Libraries
 
Reminiscing about interoperability
Reminiscing about interoperabilityReminiscing about interoperability
Reminiscing about interoperability
 
Open Annotation Model
Open Annotation ModelOpen Annotation Model
Open Annotation Model
 
How Libraries Use Publisher Metadata Redux (Steven Shadle)
How Libraries Use Publisher Metadata Redux (Steven Shadle)How Libraries Use Publisher Metadata Redux (Steven Shadle)
How Libraries Use Publisher Metadata Redux (Steven Shadle)
 
Towards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemTowards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication System
 
Signposting for Repositories
Signposting for RepositoriesSignposting for Repositories
Signposting for Repositories
 
Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013
 
Signposting Overview
Signposting OverviewSignposting Overview
Signposting Overview
 
Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the web
 
FAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueFAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning Issue
 
PID Signposting Pattern
PID Signposting PatternPID Signposting Pattern
PID Signposting Pattern
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
 
Interoperability for web based scholarship
Interoperability for web based scholarshipInteroperability for web based scholarship
Interoperability for web based scholarship
 
Web-scale Discovery Implementation with the End User in Mind (SLA 2012)
Web-scale Discovery Implementation with the End User in Mind (SLA 2012)Web-scale Discovery Implementation with the End User in Mind (SLA 2012)
Web-scale Discovery Implementation with the End User in Mind (SLA 2012)
 
EDS Web-scale Panel (Preprint), 2012 Charleston Conference
EDS Web-scale Panel (Preprint), 2012 Charleston ConferenceEDS Web-scale Panel (Preprint), 2012 Charleston Conference
EDS Web-scale Panel (Preprint), 2012 Charleston Conference
 
Environmental trends and OCLC Research, a presentation at the University of N...
Environmental trends and OCLC Research, a presentation at the University of N...Environmental trends and OCLC Research, a presentation at the University of N...
Environmental trends and OCLC Research, a presentation at the University of N...
 
IOTA @ NASIG 2011: Measuring the Quality of OpenURL Links
IOTA @ NASIG 2011: Measuring the Quality of OpenURL LinksIOTA @ NASIG 2011: Measuring the Quality of OpenURL Links
IOTA @ NASIG 2011: Measuring the Quality of OpenURL Links
 
April 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early Adopters
April 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early AdoptersApril 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early Adopters
April 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early Adopters
 
Linked Data - Radical Change?
Linked Data -  Radical Change?Linked Data -  Radical Change?
Linked Data - Radical Change?
 
Isni where are we now gatenby harvard 2014 11
Isni where are we now gatenby harvard 2014 11Isni where are we now gatenby harvard 2014 11
Isni where are we now gatenby harvard 2014 11
 

Similar a OpenCitations

Freedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations ariseFreedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations ariseUniversity of Bologna
 
David Shotton - OpenCon Oxford, 1st Dec 2017
David Shotton - OpenCon Oxford, 1st Dec 2017David Shotton - OpenCon Oxford, 1st Dec 2017
David Shotton - OpenCon Oxford, 1st Dec 2017Crossref
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasAngelo Salatino
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasAngelo Salatino
 
finde datasets repository.pptx
finde datasets repository.pptxfinde datasets repository.pptx
finde datasets repository.pptxhasanrdhaiwi
 
Towards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citationsTowards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citationsAndrea Nuzzolese
 
From Open Access to Open Science: from the Viewpoint of a Scholarly Publisher
From Open Access to Open Science: from the Viewpoint of a Scholarly PublisherFrom Open Access to Open Science: from the Viewpoint of a Scholarly Publisher
From Open Access to Open Science: from the Viewpoint of a Scholarly PublisherPensoft Publishers
 
Open Annotation Collaboration Introduction
Open Annotation Collaboration IntroductionOpen Annotation Collaboration Introduction
Open Annotation Collaboration IntroductionTimothy Cole
 
University at Albany Lunch and Learn
University at Albany Lunch and LearnUniversity at Albany Lunch and Learn
University at Albany Lunch and Learnrachelmccullough
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research ObjectsCarole Goble
 
How Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open ScienceHow Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open Sciencedrnigam
 
British Library
British LibraryBritish Library
British Libraryclarivate
 
Open Bibliography, Citations and Scholarship
Open Bibliography, Citations and ScholarshipOpen Bibliography, Citations and Scholarship
Open Bibliography, Citations and Scholarshipbenosteen
 
Semantic lenses to bring digital and semantic publishing together
Semantic lenses to bring digital and semantic publishing togetherSemantic lenses to bring digital and semantic publishing together
Semantic lenses to bring digital and semantic publishing togetherUniversity of Bologna
 
Ontology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical StudyOntology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical StudyDebashisnaskar
 
From Bibliometrics to Cybermetrics - a book chapter by Nicola de Bellis
From Bibliometrics to Cybermetrics - a book chapter by Nicola de BellisFrom Bibliometrics to Cybermetrics - a book chapter by Nicola de Bellis
From Bibliometrics to Cybermetrics - a book chapter by Nicola de BellisXanat V. Meza
 
Overview of Bibliometrics - IAP Course version 1.1
Overview of Bibliometrics - IAP Course version 1.1Overview of Bibliometrics - IAP Course version 1.1
Overview of Bibliometrics - IAP Course version 1.1Micah Altman
 

Similar a OpenCitations (20)

Freedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations ariseFreedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations arise
 
David Shotton - OpenCon Oxford, 1st Dec 2017
David Shotton - OpenCon Oxford, 1st Dec 2017David Shotton - OpenCon Oxford, 1st Dec 2017
David Shotton - OpenCon Oxford, 1st Dec 2017
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
finde datasets repository.pptx
finde datasets repository.pptxfinde datasets repository.pptx
finde datasets repository.pptx
 
Towards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citationsTowards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citations
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
 
From Open Access to Open Science: from the Viewpoint of a Scholarly Publisher
From Open Access to Open Science: from the Viewpoint of a Scholarly PublisherFrom Open Access to Open Science: from the Viewpoint of a Scholarly Publisher
From Open Access to Open Science: from the Viewpoint of a Scholarly Publisher
 
Open Annotation Collaboration Introduction
Open Annotation Collaboration IntroductionOpen Annotation Collaboration Introduction
Open Annotation Collaboration Introduction
 
University at Albany Lunch and Learn
University at Albany Lunch and LearnUniversity at Albany Lunch and Learn
University at Albany Lunch and Learn
 
Syracuse Lunch and Learn
Syracuse Lunch and LearnSyracuse Lunch and Learn
Syracuse Lunch and Learn
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
 
How Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open ScienceHow Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open Science
 
British Library
British LibraryBritish Library
British Library
 
Open Bibliography, Citations and Scholarship
Open Bibliography, Citations and ScholarshipOpen Bibliography, Citations and Scholarship
Open Bibliography, Citations and Scholarship
 
Semantic lenses to bring digital and semantic publishing together
Semantic lenses to bring digital and semantic publishing togetherSemantic lenses to bring digital and semantic publishing together
Semantic lenses to bring digital and semantic publishing together
 
Ontology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical StudyOntology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical Study
 
From Bibliometrics to Cybermetrics - a book chapter by Nicola de Bellis
From Bibliometrics to Cybermetrics - a book chapter by Nicola de BellisFrom Bibliometrics to Cybermetrics - a book chapter by Nicola de Bellis
From Bibliometrics to Cybermetrics - a book chapter by Nicola de Bellis
 
2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery
 
Overview of Bibliometrics - IAP Course version 1.1
Overview of Bibliometrics - IAP Course version 1.1Overview of Bibliometrics - IAP Course version 1.1
Overview of Bibliometrics - IAP Course version 1.1
 

Más de University of Bologna

A Simplified Agile Methodology for Ontology Development
A Simplified Agile Methodology for Ontology DevelopmentA Simplified Agile Methodology for Ontology Development
A Simplified Agile Methodology for Ontology DevelopmentUniversity of Bologna
 
A pattern-based ontology for describing publishing workflows
A pattern-based ontology for describing publishing workflowsA pattern-based ontology for describing publishing workflows
A pattern-based ontology for describing publishing workflowsUniversity of Bologna
 
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...University of Bologna
 
Characterising citations in scholarly articles: an experiment
Characterising citations in scholarly articles: an experimentCharacterising citations in scholarly articles: an experiment
Characterising citations in scholarly articles: an experimentUniversity of Bologna
 
Bringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointersBringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointersUniversity of Bologna
 
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...University of Bologna
 
Towards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citationsTowards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citationsUniversity of Bologna
 
The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...University of Bologna
 
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...University of Bologna
 
Embedding semantic annotations within texts: the FRETTA approach
Embedding semantic annotations within texts: the FRETTA approachEmbedding semantic annotations within texts: the FRETTA approach
Embedding semantic annotations within texts: the FRETTA approachUniversity of Bologna
 

Más de University of Bologna (13)

A Simplified Agile Methodology for Ontology Development
A Simplified Agile Methodology for Ontology DevelopmentA Simplified Agile Methodology for Ontology Development
A Simplified Agile Methodology for Ontology Development
 
FOOD: FOod in Open Data
FOOD: FOod in Open DataFOOD: FOod in Open Data
FOOD: FOod in Open Data
 
A pattern-based ontology for describing publishing workflows
A pattern-based ontology for describing publishing workflowsA pattern-based ontology for describing publishing workflows
A pattern-based ontology for describing publishing workflows
 
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
 
Characterising citations in scholarly articles: an experiment
Characterising citations in scholarly articles: an experimentCharacterising citations in scholarly articles: an experiment
Characterising citations in scholarly articles: an experiment
 
Bringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointersBringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointers
 
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...
 
Towards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citationsTowards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citations
 
The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...
 
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
 
Embedding semantic annotations within texts: the FRETTA approach
Embedding semantic annotations within texts: the FRETTA approachEmbedding semantic annotations within texts: the FRETTA approach
Embedding semantic annotations within texts: the FRETTA approach
 
Dealing with Markup Semantics
Dealing with Markup SemanticsDealing with Markup Semantics
Dealing with Markup Semantics
 
Handling Markup Overlaps Using OWL
Handling Markup Overlaps Using OWLHandling Markup Overlaps Using OWL
Handling Markup Overlaps Using OWL
 

Último

Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 
Introduction to Viruses
Introduction to VirusesIntroduction to Viruses
Introduction to VirusesAreesha Ahmad
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Servicemonikaservice1
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxMohamedFarag457087
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsOrtegaSyrineMay
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Silpa
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxseri bangash
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 

Último (20)

Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Introduction to Viruses
Introduction to VirusesIntroduction to Viruses
Introduction to Viruses
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 

OpenCitations

  • 1. Oxford e-Research Centre University of Oxford, UK WikiCite 2017 Vienna, Austria 23 May 2017 © David Shotton and Silvio Peroni, 2017 Published under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 Licence david.shotton@opencitations.net David Shotton Silvio Peroni silvio.peroni@opencitations.net Dept. Computer Science and Engineering University of Bologna, Italy
  • 2. What is a citation? n  The performative act of citing a published work that is relevant to the current work, typically made by including a reference in a reference list Why are citations important? n  The act of bibliographic citation is central to scholarly communication – bibliographic references are the links that knit together independent scholarship n  Citations unify the whole world of scholarship into a giant citation network n  Citation networks reveal the development of academic disciplines n  Sir Isaac Newton: “If I have seen a little further, it is by standing on the shoulders of Giants”
  • 3. How is the present situation imperfect? n  The present scholarly citation system inadequately exposes the knowledge networks that exist within the scholarly literature n  Citation data are hidden behind subscription firewalls of commercial companies n  Academics are not free to use their own citation data as they please n  In this Open Access age, it is a scandal that reference lists from journal articles, the core elements of the academic data cycle, are not freely available for use by the scholars who created them n  Citation data now need to be recognized as a part of the Commons – those works that are freely and legally available for sharing n  To address this issue, we have developed The OpenCitations Corpus
  • 4. How this came about - 2009 adventures in semantic publishing
  • 5. The SPAR (Semantic Publishing and Referencing) Ontologies FaBiO, the FRBR-aligned Bibliographic Ontology - an ontology for describing bibliographic entities (books, articles, etc.) CiTO, the Citation Typing Ontology - enable characterization of citations, both factually and rhetorically BiRO, the Bibliographic Reference Ontology - an ontology to define bibliographic records and references, and their compilation into bibliographic collections and reference lists, respectively C4O, the Citation Counting and Context Characterization Ontology DoCO, the Document Components Ontology PRO, the Publishing Roles Ontology PSO, the Publishing Status Ontology PWO, the Publishing Workflow . . . and now others http://www.sparontologies.net/
  • 6. The OpenCitations Corpus n  The OpenCitations Corpus is a Linked Open Data repository of scholarly bibliographic citation data described using the SPAR ontologies n  Prototype created at Oxford in 2011 by Alex Dutton with JISC funding n  A new instantiation created by Silvio at the University of Bologna in late 2015 §  based on a revised metadata schema, with automated daily ingestion of citations from authoritative sources n  OCC now provides the largest RDF collection of open citation data on the Web §  currently holds the references from ~150,000 citing bibliographic resources §  providing ~6.7 million citation links to over 4 million cited resources n  These citations are encoded using the SPAR ontologies, and are freely available under a CC0 public domain waiver from http://opencitations.net/ n  The OpenCitations Enhancement Project has just been funded by the Sloan Foundation, to enhance ingest rates and provide smart data visualization interfaces
  • 7. Ingestion workflow n  We developed several scripts for implementing the ingestion workflow that populates the OpenCitations Corpus n  All the software is available on the OpenCitations GitHub repository https://github.com/essepuntato/opencitations §  Released as open source code with the ISC License https://opensource.org/licenses/ISC n  These scripts implement a live and iterative process n  Why live? §  It is working while I’m speaking §  It does not sleep, never §  It is like a sentient, relentless, fast zombie – watch out! n  Why iterative? The ingestion workflow continuously calls several external APIs to obtain new reference lists and clean metadata of the citing and cited papers O C
  • 8. Reference lists from PubMed Central n  At present, all the reference lists are taken by processing the XML sources of the papers in the PubMed Central Open Access subset n  We use the Europe PubMed Central API for retrieving the XML sources §  We ask for the most recent papers first §  Thus, as citing papers, the OCC mainly includes articles published in 2016 and 2017 n  There are 1.58M OA articles available in PubMed, according to their API http://www.ebi.ac.uk/europepmc/webservices/rest/search?query=open_access:y §  We have harvested 10% so far . . . n  The identifiers of all the citing papers that we have been already processed by the ingestion workflow are stored locally, so as not to request the same XML source twice
  • 9. Metadata from Crossref and ORCID n  The reference lists extracted from citing papers are made available in JSON: {
 "doi": "10.1007/s11892-016-0752-4",
 "pmid": "27168063",
 "pmcid": "PMC4863913",
 "localid": "MED-27168063",
 "curator": "BEE EuropeanPubMedCentralProcessor",
 "source": "http://www.ebi.ac.uk/europepmc/webservices/rest/PMC4863913/fullTextXML",
 "source_provider": "Europe PubMed Central”
 "references": [
 ...
 {
 "bibentry": "Chang, KY, Unanue, ER. Prediction of HLA-DQ8beta cell peptidome using
 a computational program and its relationship to autoreactive T cells,
 Int Immunol, 2009, 21, 6, 705, 13, DOI: 10.1093/intimm/dxp039, 
 PMID: 19461125",
 "pmid": "19461125",
 "doi": "10.1093/intimm/dxp039",
 "pmcid": "PMC2686615",
 "process_entry": "True”
 },
 ...
 ]
 } n  We then call the Crossref APIs to obtain additional information (title, authors, venues, etc.) about the citing paper and about those papers described in the reference list, and then call the ORCID APIs to obtain ORCIDs of the authors The citing paper's metadata and identifiers A reference in the citing paper's reference list, with its own ids
  • 10. The OpenCitations Corpus data model n  Available at https://doi.org/10.6084/m9.figshare.3443876 n  Implemented in the OpenCitations Ontology (OCO, https://w3id.org/oc/ontology) §  It is not yet another bibliographic ontology, but rather simply a mechanism for grouping together existing complementary ontological entities from several other ontologies (e.g. SPAR and FOAF)
  • 11. Resources included within the Corpus (as of 26 April 2017) Entity type What it describes Count in the OCC Bibliographic resource (br) Conference papers, book chapters, journal articles, academic proceedings, books, journals, etc. 5.1 million Resource embodiment (re) Digital vs. print, first and ending pages, etc. 2.9 million Bibliographic entry (be) Textual content of a reference in a reference list 6 million Responsible agent (ra) Given name, family name and ORCID of the agent involved 15.8 million Agent role (ar) Author, publisher, etc. 20 million Identifier (id) DOI, PubMed ID, PubMed Central ID, ORCID, ISSN, etc. 10.4 million
  • 12. OpenCitations in the wild n  Twitter: https://twitter.com/opencitations n  Blog: https://opencitations.wordpress.com n  The data in the OpenCitations Corpus are available in three different ways: §  Direct access to bibliographic resources by means of their HTTP URIs (via content negotiation, e.g. https://w3id.org/oc/corpus/br/1) §  SPARQL endpoint: https://w3id.org/oc/sparql §  Monthly dumps: http://opencitations.net/download (stored in Figshare) Figshare statistics as of 8 May 2017
  • 13. Third-party usage of OpenCitations n  Projects that use OpenCitations resources: §  Wikidata §  OpenAIRE §  LOC-DB §  Others? Please let us know! n  Accesses to the OpenCitations website and services: The pages relating to the data available (“corpus”) and the service for querying them (“sparql”) have together gained 88% of the overall accesses, showing that the main reason people access the OpenCitations website is to explore and use the data in the OpenCitations Corpus
  • 14. What happened in the past month n  Use of the OpenCitations social accounts (Twitter, Blog on Wordpress) increased markedly during the past month n  What happened?
  • 15. Initiative for Open Citations (I4OC) n  The Initiative for Open Citations (I4OC, https://i4oc.org) is a collaboration between scholarly publishers, researchers, and other interested parties to promote the unrestricted availability of scholarly citation data n  Founders: n  Aim: promote the availability of structured, separable, and open citation data n  How: asking publishers §  to submit article metadata (including reference lists) to Crossref Cited-by service §  to allow Crossref to open the reference lists to the public n  Achievement: as of March 2017, publications with open references freely available in Crossref has grown from 1% to more than 40% OpenCitations is one of the founder
  • 16. The OpenCitations ingestion rate: an update About 500,000 new citations links added per month per day New infrastructure coming soon (thanks to the OpenCitations Enhancement Project just funded by the Sloan Foundation) The OpenCitations will have ~190 million citation links after one year of processing with the new infrastructure
  • 17. Thank you for your attention david.shotton@opencitations.net David Shotton Silvio Peroni silvio.peroni@opencitations.net Website: http://opencitations.net Email: contact@opencitations.net Twitter: @opencitations Blog: https://opencitations.wordpress.com Github: https://github.com/essepuntato/opencitations Contacts