Sitations are the way that researchers communicate how
their work builds on and relates to the work of others and
they can be used to trace how a discovery spreads and is
used by researchers in different disciplines and countries.
Creating a truly comprehensive map of scholarship,
however, relies on having a curated machine-readable
database of citation information, where the provenance of
every citation is clear and reusable. The Initiative for Open
Citations (I4OC), a campaign launched on 6 April 2017,
sought to make publisher members of Crossref aware that
they could open up the citation metadata they already give
to Crossref simply by asking them. With the support of
major publishers and the endorsement of funders and other
organisations, more than 50% of citation data in Crossref
is now freely available, up from less than 1% before the
campaign. This provides the foundation of a well-structured,
open database of literally millions of datapoints that anyone
can query, mine, consume and explore. The presenter will
discuss the aims of the campaign, the new innovative
services that are already using the data, what more still
needs to be done and how you can support the initiative.
Catriona J MacCallum, Hindawi
UKSG 2018 Breakout - Setting your cites to open I4OC - Maccallum
1. Setting your cites on open
The Initiative for Open Citations (I4OC)
what it is, why it matters and how you can get involved
Catriona J. MacCallum (Hindawi)
Mark Patterson (eLife) • Dario Taraborelli (Wikimedia Foundation)
UKSG• Glasgow, April 2018
2. Open Access since 2007
~18,000 peer-reviewed articles a year
Science, Technology & Medicine
A founding member of OASPA
Free access – no charge to access
No embargos – immediately available
Reuse – Creative Commons Attribution License
(CC BY) - use with proper attribution
6. The Initiative for Open Citations
What it is
Why it matters
Knowledge Discovery
Evaluation
Beyond the article…
How open citations are being re-used
How you can get involved
10. Aim of I40C
To promote the availability of data on citations that are structured,
separable, and open.
Structured - the data representing each publication and each citation instance
are expressed in common, machine-readable formats.
Separable - citations can be accessed and analyzed without the need to access
the source bibliographic products (such as journal articles and books).
Open - the data are freely accessible and reusable.
11. Why?
Establish a global public web of linked scholarly citation data to enhance
the discoverability of published content, both subscription access and open
access.
This will particularly benefit individuals who are not members of academic institutions
with subscriptions to commercial citation databases.
Build new services for the benefit of publishers, researchers, funding
agencies, academic institutions and the general public
And enhance existing services.
Create a public citation graph to explore connections between knowledge
fields, and to follow the evolution of ideas and scholarly disciplines.
12. Image: Andy Lamb, CC BY https://www.flickr.com/photos/speedoflife/8273922515/in/photostream/
14. How it came
together
The starting point
Most publishers already deposit
their citation data with Crossref
The default state for the data is not
open
The challenge
Could we persuade a group of
influential publishers to release
their data all at once?
15. Making the case
It’s easy and doesn’t cost anything
All you need to do is to send an email to support@crossref.org
Publishers also benefit
Better discovery tools mean that content will be found and
used more
The goal cannot be achieved alone
A comprehensive network of all scholarship can only be
achieved if data is pooled
16. Making it happen
Agree a deadline
Everyone has time to prepare their comms and to be part of a
big splash
Focus on publishers depositing the most data
Contacted the top-20 publishers asking for agreement in
principle and permission to share their decision
Leverage the early adopters
As soon as we had a few publishers on board, others quickly
followed
17.
18. Why are publishers joining I4OC?
“If you’re not looking to
monetize references in some
way, why wouldn’t you?”
“We believe there is great benefit
in supporting sustainable and
standardized infrastructure.
Opening up our reference
metadata cost us no more than
the time required to write one
simple email.” Liz Ferguson,
Wiley
“At Taylor & Francis we are
working to make it as easy as
possible for the communities we
serve to achieve their open aims.
I4OC sits well with this, and was a
very quick and easy process to
implement.”
“Although we charge for metadata
feeds, those are service- rather
than content-based charges. We
didn’t identify any commercial
downside of supporting I4OC as
we are highly unlikely to develop
significant revenue streams from
just our own references.”
19. “References have long been a path to
serendipitous discovery. Making citation
data open and machine readable will
only accelerate that discovery
process for researchers.”
Why are publishers joining I4OC?
“One of the key purposes of a
publisher is to assist in the
development of networks of
scholarship to aid the cross
fertilization of research. Freeing
up the reference data is an
extremely powerful way of doing
that.”
“One of the most exciting
benefits is the potential to
expose networks of research
that might otherwise take
years to discover.”
“It will make our customers’ lives
easier by helping data scientists to
mine a large body of references in
one go. Currently we see little threat
to our business as this aligns perfectly
with our aims to go beyond open
access to research, by using open
approaches and utilizing our own data
to advance discovery.” Steven
Inchcoombe, Springer Nature
20. The Initiative for Open
Citations
We built a coalition of major funders,
technology platforms, open data
organizations and publishers supporting
the unrestricted availability of scholarly
citation data.
STAKEHOLDERS OF THE INITIATIVE FOR OPEN CITATIONS • http://i4oc.org
25. LONDON UNDERGROUND MAP FROM 1908 (Public Domain) • https://commons.wikimedia.org/wiki/File:Tube_map_1908-2.jp
Can also explore
how the map of
scholarship has
evolved
26. One year on…
The fraction of open citation data has surpassed 50%
The number of participating publishers has risen to 490.
There are over 500 million references now openly available.
There are almost 50 stakeholder organisations who have joined
I4OC to help advocate and promote reuse of open citations.
The initiative has attracted commentary and media coverage
across the world.
27. Of the top-20 biggest
publishers with citation
data, all but five now
make these data open via
Crossref.
Three represent Scholarly
Societies…
28.
29. Crossref was founded to enable a
shared reciprocal linking and
metadata exchange, removing the
need for bilateral agreements
between publishers and other
service providers.
30. Extracting data via the
Crossref API
~41% Crossref records
have citation data
~47% of those have
public citation data
ACKNOWLEDGEMENT: DANIEL ECER, DATA SCIENTIST, ELIFE • https://elifesci.org/crossref-data-notebook
Exploring the data from Crossref
31. >1billion citations
49% are open
53% have DOIs (and
can be linked to
another record)
Some cleanup
required
Exploring the data from Crossref
ACKNOWLEDGEMENT: DANIEL ECER, DATA SCIENTIST, ELIFE • https://elifesci.org/crossref-data-notebook
32. Why do we need open citations?
The ability to undertake large-scale and generalizable
bibliometric research … is limited to a few well-funded centers
that can afford to pay for full access to the raw data of Web of
Science or Scopus.
…scientometricians need a data source that is freely available
and comprehensive. This is a matter of scientific integrity,
scientific progress, and equity
Scientometrics is widely used to support science policy and
research evaluation, with consequences for the entire scientific
community. There is a need for specialized organizations, both
commercial and non-commercial, that offer scientometric
services.
...to guarantee full transparency and reproducibility of
scientometric analyses, these analyses need to be based on open
data sources.
advocating for open references is critical to ensure replicable and
equitable research practices.
We should use our relationships with journals—as authors,
reviewers, and editorial board members—to advocate for
openness and should expect scientometric journals to be leaders
in this respect.
“References are a product of scholarly work and represent the
backbone of science—demonstrating the origin and advancement
of knowledge—and provide essential information for studying
science and making decisions about the future of research.
References are generated by the academic community and should
be freely available to this community.”
Dec 2017
33.
34. Who cares about
measuring research
impact?
Researchers
(authors and
readers)
Publishers Funders The public
Policy Makers
Institutions
35. Impact factors mask huge variation in citations - if you use
it you are dishonest and statistically illiterate
@Stephen_Curry #COASP 2015
COASP7 ‘Research and researcher evaluation’ (2015),
Stephen Curry (Imperial College London) – available soon
from OASPA website
36. The Acta Crystallographica
Section A effect. The plot shows
that this journal had a JIF of
2.051 in 2008 which jumped to
49.926 in 2009 due to a single
highly-cited paper. Did every
other paper in this journal
suddenly get amazingly awesome
and highly-cited for this period?
Of course not.
Steve Royle. “Wrong Number: A Closer Look at Impact Factors.” Quantixed, May 2015. https://quantixed.wordpress.com/2015/05/05/wrong-number-a-closer-look-at-impact-factors/
37. Imperfect Impact
Stuart Cantrill January 23, 2016 Imperfect impact Chemical connections
https://stuartcantrill.com/2016/01/23/imperfect-impact/
38. Citation Bias
CC BY NC Steven A Greenberg BMJ
2009;339:bmj.b2680 How citation distortions create
unfounded authority: analysis of a citation network
http://www.bmj.com/content/339/bmj.b2680
• Citations to papers
supporting rationale for
overproduction of β amyloid
precursor protein mRNA as a
valid model of inclusion body
myositis.
• The supportive papers
received 94% of the 214
citations to these primary
data, whereas the six papers
containing data that
weakened or refuted the
claim received only 6% of
these citations
39. Fig 1. Citation distributions of 11 different science journals.
Citations are to ‘citable documents’ as classified by Thomson
Reuters, which include standard research articles and reviews. The
distributions contain citations accumulated in 2015 to citable
documents published in 2013 and 2014 in order to be comparable
to the 2015 JIFs published by Thomson Reuters. To facilitate direct
comparison, distributions are plotted with the same range of
citations (0-100) in each plot; articles with more than 100 citations
are shown as a single bar at the right of each plot.
0
10
20
30
40
50
60
70
80
90
0 10 20 30 40 50 60 70 80 90 100+
Numberofpapers
Number of citations
eLife
0
5
10
15
20
25
30
35
40
45
0 10 20 30 40 50 60 70 80 90 100+
Numberofpapers
Number of citations
EMBO J.
0
10
20
30
40
50
60
0 10 20 30 40 50 60 70 80 90 100+
Numberofpapers
Number of citations
J. Informetrics
0
10
20
30
40
50
60
70
80
0 10 20 30 40 50 60 70 80 90 100+
Numberofpapers
Number of citations
Nature
0
50
100
150
200
250
300
350
400
0 10 20 30 40 50 60 70 80 90 100+
Numberofpapers
Number of citations
Nature Comm.
0
5
10
15
20
25
30
35
40
45
0 10 20 30 40 50 60 70 80 90 100+
Numberofpapers
Number of citations
PLOS Biol.
0
20
40
60
80
100
120
140
160
180
200
0 10 20 30 40 50 60 70 80 90 100+
Numberofpapers
Number of citations
PLOS Genet.
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
0 10 20 30 40 50 60 70 80 90 100+
Numberofpapers
Number of citations
PLOS ONE
0
20
40
60
80
100
120
140
160
180
200
0 10 20 30 40 50 60 70 80 90 100+
Numberofpapers
Number of citations
Proc. R. Soc. B
0
10
20
30
40
50
60
70
0 10 20 30 40 50 60 70 80 90 100+
Numberofpapers
Number of citations
Science
0
200
400
600
800
1,000
1,200
0 10 20 30 40 50 60 70 80 90 100+
Numberofpapers
Number of citations
Sci. Rep.
A simple proposal for the publication of journal citation distributions
Vincent Larivière1, Véronique Kiermer2, Catriona J. MacCallum3, Marcia McNutt4, Mark Patterson5, Bernd
Pulverer6, Sowmya Swaminathan7, Stuart Taylor8, Stephen Curry9*
Published in bioRxiv, 2016 : http://biorxiv.org/content/early/2016/07/05/062109 CC BY
40. Can Scientists Assess Merit or Predict
Impact?
Analysed subjective rankings of papers from two different data
sets over five years
• Faculty of 1000
• Welcome Trust (data from Allen et al. of 2 assessor
rankings within 6 months of publication)
• In relation to citations and impact factor
Eyre-Walker A, Stoletzki N (2013) The Assessment of Science:
The Relative Merits of Post-Publication Review, the Impact
Factor, and the Number of Citations. PLoS Biol 11(10):
e1001675. doi:10.1371/journal.pbio.1001675
http://www.plosbiology.org/article/info:doi/10.1371/journal.pb
io.1001675
41. Subjective assessments of science are poor:
Very weak correlation between assessors
Strongly biased by the journal in which the paper was published
Number of citations or the impact factor exaggerates differences between papers
Scientists are also poor at predicting the future impact:
Because they are not good at assessing merit
Similar articles accumulate citations essentially by chance.
“What this paper shows is that whatever merit might be, scientists can't be
doing a good job of evaluating it when they rank the importance or quality of
papers. From the (lack of) correlation among assessor scores, most of the
variation in ranking has to be due to ‘error’ rather than actual quality
differences.”
Carl Bergstrom , 2013
Eisen JA, MacCallum CJ, Neylon C (2013) Expert Failure: Re-evaluating Research Assessment. PLoS Biol 11(10): e1001677.
doi:10.1371/journal.pbio.1001677
42. What is Quality?
Context dependent
Discipline
Stage of your career
Different levels
Individual
Project
Institutional (rankings…)
National and International
Cannot be distilled into a single number or proxy
Multi-variate
Metrics need to be qualitative as well as quantitative
46. References are data…
Data about the network of information
Between scholars, fields and science & society
A source with which to validate a scholarly work
Data sharing is on the agenda…
OECD
EU Open Science
AGU Enabling Fair Data Project
Belmont Forum
NIH, NSF
RDA, CoData, FORCE11 & many others
Data citation is a prerequisite as a first class research object
e.g. DataCite DOIs in the reference list…
References are data
One of the most expertly curated sources of scholarly recommendations…
47. We need to apply the scientific
method to the process of
scholarly communication itself
48. Open Science?
Open
Science
= Open Infrastructure+Open Outputs Culture
(change)
X
Access, reuse &
discoverability Evaluation &
Researcher behaviour
How
Jeff Rouder
@JeffRouder
What is Open Science? It is endeavoring to preserve the rights of others to
reach independent conclusions about your data and work.
8:47 PM - 5 Dec 2017
Why
49. most of the data needed to support Open Science is
controlled by commercial companies, both big and
small. This growing reliance on a handful of companies
to provide proprietary analytics and decision tools for
research funders and universities poses serious risks for
the future
Open Source
• prevents monopolistic control
• requires an active community of users and service
providers to develop and maintain infrastructure
Open Data
• metadata about the research process itself, such as
funding data, publication and citation data, and
“altmetrics” data
Open Integrations
• standard metadata formats and open APIs
Open Contracts
• completely open (public) and no lock-in (e.g. Non-
Disclosure Agreements, multi-year contract terms, and
privately negotiated prices)
50. PARTIAL CITATION GRAPH FOR ULRICH K. LAEMMLI (1970) • http://tinyurl.com/kbzdxwh
How data from the I4OC is being reused?
The Wikidata
Citation Graph
36 million citation links
using the cites (P2860)
property in Wikidata
51.
52.
53. How data from the I4OC is being reused?
Tools to create profiles
Scholia uses data from Wikidata
sourced from Crossref and other
Metadata providers
PROFILE INFORMATION FOR EGON WILLIGHAGEN • https://tools.wmflabs.org/scholia/author/Q20895241
54.
55.
56. How data from the I4OC is being reused?
Integration of cited
by data by
ScienceOpen
SEARCH RESULTS FROM SCIENCEOPEN SHOWING CITED-BY DATA • http://www.scienceopen.com/
57. How data from the I4OC is being
reused?
The Open Citations
Corpus
A broad and open collection of
citation information from many
sources
David Shotton and Silvio Peroni
PROGRESS OF THE INITIATIVE FOR OPEN CITATIONS • http://i4oc.org
58. Towards a fully open scholarly graph
“The visualization shows a
structure of science that is well
known from earlier large-scale
bibliometric visualizations,
which were based on Web of
Science or Scopus data.”
VISUALIZING FREELY AVAILABLE CITATION DATA USING VOSVIEWER • https://www.cwts.nl/blog?article=n-r2r294
61. The Initiative for Open Citations • I4OC
Making tens of millions of machine-readable citation
metadata openly available to everyone, with no copyright
restriction.
PROGRESS OF THE INITIATIVE FOR OPEN CITATIONS • http://i4oc.org
62. The road to 100%
CROSSREF MEMBERS WITH OPEN REFERENCES • https://www.crossref.org/reports/members-with-
open-references/
A list of all Crossref members
with open references and
statistics on their open
reference coverage
63. Getting involved
If you are a publisher and deposit
references, email
support@crossref.org
A CALL TO ACTION TO THE I4OC STAKEHOLDERS • https://twitter.com/i4oc_org/status/894934190625402880
64. I am a scholarly publisher already depositing
references to Crossref. How do I publicly release
them?
If you are already submitting article metadata to Crossref as a
participant in their Cited-by service:
1. Contact Crossref support directly by e-mail, asking them to turn on
reference distribution for all of the relevant DOI prefixes
OR
2. Set the <reference_distribution_opt> metadata element to "any" for
each DOI deposit for which they want to make references openly
available.
65. How you can help (1)
• Publishers who aren't making their references public yet - send an email to
Crossref before the end of the month requesting them to make your references
open. It's that simple!
• Publishers who don't yet deposit references with Crossref - contact Crossref to
find out how to do this.
• Editors and editorial board members – if references in your journal are not yet
made public - contact your publisher and request this. Use this list to see whether
your publisher is already making references open.
• Funders, institutions, companies, researchers, and all other users of open
citation data - write a short piece about your work and the benefits of open
citation data for the I4OC website. Please contact info@i4oc.org.
66. How you can help (2)
• If you have a story about open citation data and why they matter to
your organization and community, share a link, tag it as
#OpenCitationsMonth. We’ll retweet it to our followers.
• Please keep an eye on the #OpenCitationsMonth tag, and help us to
amplify the message.
67.
68.
69.
70. Thank you
C.J. MacCallum, M. Patterson, D. Taraborelli (2017) Setting your cites on open:
what it is, why it matters and how you can help. UKSG 2018 [CC BY 4.0]*
Acknowledgments
The I4OC founders: OpenCitations, Wikimedia Foundation, PLOS, eLife, DataCite, the
Center for Culture and Technology at Curtin University.
The I4OC instigators: Jonathan Dugan, Martin Fenner, Jan Gerlach, Catriona MacCallum,
Daniel Mietchen, Cameron Neylon, Mark Patterson, Michelle Paulson, Silvio Peroni, David
Shotton, Dario Taraborelli
The I4OC stakeholders (i4oc.org/#stakeholders) and participating publishers
(i4oc.org/#publishers)
Notas del editor
[mention the 3 principles]
[introduce briefly what the initiative is about]
[COASP 2016: realize this data existed but was not exposed by default]
[COASP 2016: realize this data existed but was not exposed by default]
[COASP 2016: realize this data existed but was not exposed by default]
[Building a truly open (CC0) scholarly graph: moonshot]
To track the progress of the initiative Crossref has released a directory with statistics about all publishers with open references
There is still a long way to 100% coverage
If you’re a journal editor, a researcher, an organization producing or consuming scholarly metadata, we’re hope you’ll join us in helping promote the free availability