1. Emma Ganley, Chief Editor, PLOS Biology
eganley@plos.org @GanleyEmma
Dec 2015
Publishing, Open Data,
& Open Access: How
Open is Open?
2. PLOSis a non-profit publisher and
advocacy organization with a mission to
accelerate progress in science and medicine by
leading a transformation in research
communication.
The Mission…
4. ELIMINATE UNNECESSARY BARRIERS
to immediate availability, access and use of research.
PURSUE A PUBLISHING STRATEGY
that drives openness, quality and integrity.
DEVELOP INNOVATIVE APPROACHES
to the assessment, organization and reuse of ideas and data.
PLOSand its authors choose to make
scientific and medical research articles openly
available for the advancement of science and the
greater public good.
...to enable those uses that we can’t yet imagine
5. What is Open Access ?
Free Availability and Unrestricted Use
Free access – no charge to access
No embargos – immediately available
Reuse – Creative Commons Attribution
License (CC BY) - use with proper
attribution
8. The Journal
“Journals form a core part of the process of scholarly
communication and are an integral part of scientific research
itself. Journals do not just disseminate information, they also
provide a mechanism for the registration of the author’s
precedence; maintain quality through peer review and provide
a fixed archival version for future reference.”
The STM Report, Fourth Edition, 2015
9. • >28, 000 peer-reviewed English language journals (2014)
• 10,900 in Journal Citation Reports
• 2.5 million articles a year
• 500-10,000 journal publishers
• 7-9 million researchers
• Most publishers have >90% content available online
• The annual revenues generated from English-language
STM journal publishing are estimated at about $10 billion in
2013, (up from $8 billion in 2008)
• broader STM information publishing market worth some
$25.2 billion [in 2013].
The STM Report, Fourth Edition, 2015)
10. % of Scholarly journal article that are OA varies from ~20%-50%, depending on
source
(Stephen Pinfield , (2015),"Making Open Access work", Online Information Review, Vol.
39 Iss 5 http://dx.doi.org/10.1108/OIR-05-2015-0167 )
From OASPA: http://oaspa.org/growth-of-oa-only-journals-using-a-cc-by-license/
0
20000
40000
60000
80000
100000
120000
140000
160000
2000 2002 2004 2006 2008 2010 2012 2014
No. of Open Access articles published by OASPA members
(Open Access Scholarly Publishers Association)
11. Dissemination by journals
• Many publishers (including societies) remain resistant to OA
• Few conversions of established journals
• ‘Hybrid’ OA very expensive
• Details of methods and results relegated to supplementary
material
• Numerous rounds of unnecessary rejection and re-review
• Many won’t publish negative or confirmatory results
• Content not reusable or discoverable:
• Many publishers don’t permit text and data mining for research
purposes
17. Science & Publishing today
Corrections, Mega-Corrections & Retractions
“In the early 2000s, only about 30 retraction notices appeared
annually; in the last five years, that number has jumped to around 500
(Nature, 478:26-28, 2011). And since the launch of Retraction Watch in
August 2010, there has been a growing interest among publishers,
editors, authors, and the press in manuscripts that are pulled from the
literature. Previously, retractions had no visibility, as they were published
without any warning. Retraction Watch provides a central location where
researchers can keep track of flawed papers and often learn a little about
the stories underlying their retractions.”
Source: Explaining Retractions
Editors and publishers should use a standardized form to detail why they are
pulling papers from the scientific literature.
By Hervé Maisonneuve,Evelyne Decullier | The Scientist December 1, 2015
18. Reporting / Standards / Reproducibility
• Publication bias (-ve studies not published)
• Selective reporting (p-hacking) 1
• Animal research reporting / Cell line authentication
• Poor Study Design (underpowered - small N, lack of
randomisation, blinding and controls)
• Poorly reported methods and results 2
• Data / Metadata not available to assess
• Lack of reproducibility 3
1Head ML, Holman L, Lanfear R, Kahn AT, Jennions MD (2015) The Extent and Consequences of P-
Hacking in Science. PLoS Biol 13(3): e1002106. doi:10.1371/journal.pbio.1002106
2Landis SC, et al. (2012) A call for transparent reporting to optimize the predictive value of preclinical
research. Nature 490(7419): 187–191.
3Freedman LP, Cockburn IM, Simcoe TS (2015) The Economics of Reproducibility in Preclinical
Research. PLoS Biol 13(6): e1002165. doi: 10.1371/journal.pbio.1002165
VALUE of the research $£€
19. TOP Guidelines
Transparency & Openness Promotion Guidelines (Center for Open Science)
Transparency, open sharing, and reproducibility are core features of science, but not
always part of daily practice. Journals can increase transparency and reproducibility
of research by adopting the TOP Guidelines. TOP includes eight modular standards,
each with three levels of increasing stringency. Journals select which of the eight
transparency standards they wish to adopt for their journal, and select a level of
implementation for the selected standards. These features provide flexibility for adoption
depending on disciplinary variation, but simultaneously establish community standards.
https://cos.io/top/
20. TOP Guidelines
• Citation standards
• Data Transparency
• Analytic Methods (Code) Transparency
• Research Materials Transparency
• Design & Analysis Transparency
• Preregistration of Studies
• Preregistration of Analysis Plans
• Replication
Science 26 June 2015:
Vol. 348 no. 6242 pp. 1422-1425
DOI: 10.1126/science.aab2374
22. Meta-Research in PLOS Biology
Recognizing the importance of research about research to
increase transparency in the biosciences
23. Will publish /
officially launch
4th Jan, 2016
2 new papers +
collection of
relevant papers
we have already
published
24. Meta-Research in PLOS Biology
The new meta-research section of PLOS Biology will be data-driven and feature experimental,
observational, modelling, and meta-analytic research that addresses issues related to the
design, methods, reporting, verification, and evaluation of research. It will also
encompass research into the systems that evaluate and reward individual scientists
and institutions. We welcome both exploratory and confirmatory research that has the potential to
drive change in research and evaluation practices in the life sciences and beyond. The themes
include, but are not limited to, transparency, established and novel methodological
standards, sources of bias (conflicts of interest, selection, inflation, funding, etc.),
data sharing, evaluation metrics, assessment, reward, and funding structures.
Kousta S, Ferguson C, Ganley E (2016) Meta-Research: Broadening the Scope of PLOS
Biology. PLoS Biol 14(1): e1002334.
DOI:10.1371/journal.pbio.1002334 January 4, 2016
25. Meta-Research in PLOS Biology
New Editorial Board Members:
Lisa Bero (Univerity of Sydney);
Isabelle Boutron (Université Paris Descartes);
Ulrich Dirnagl (Charité—Universitätsmedizin Berlin);
John PA Ioannidis (Stanford University);
Jonathan Kimmelman (McGill University);
Malcolm Macleod (University of Edinburgh);
David Vaux (Walter and Eliza Hall Institute of Medical Research);
Eric-Jan Wagenmakers (University of Amsterdam)
27. Registered Reports in PLOS Biology
Coming in 2016… We’re collaborating with
the Open Science Foundation &
Centre for Open Science to implement Registered
Reports as a new format
See https://osf.io/8mpji/wiki/home/ & https://cos.io/
“Registered Reports eliminates the bias against negative results in publishing because the
results are not known at the time of review” said Daniel Simons, Professor at University of Illinois,
Urbana-Champaign and co-Editor of Registered Replication Reports at Perspectives on Psychological
Science. Chris Chambers, Professor at Cardiff University, section editor at Cortex and Royal Society
Open Science, and chair of the Registered Reports Committee supported by the Center for Open
Science (COS) adds, “Because the study is accepted in advance, the incentives for authors
change from producing the most beautiful story to producing the most accurate one.”
28. Registered Reports in PLOS Biology
What are Registered Reports?
A departure from regular peer review
“scientists state at least part of what they’re going to
do before they do it, registration gently but firmly compels
us to stick to the scientific method.” – Chris Chambers
Not ideal for exploratory science
Very good for well defined protocols e.g. trials/statistical
analysis/animal research trials/field studies/… etc.
http://www.theguardian.com/science/head-quarters/2014/may/20/psychology-registration-revolution
30. Data Availability Declines Over Time
ALMOST ALL DATA LOST
10-15 YRS AFTER
PUBLICATION
Source: How Does the Availability of Research Data Change With
Time Since Publication?
Timothy H. Vines and colleagues, Abstract (podium), Peer
Review Congress, 2013
31. Phylogenetics Data
7500 papers studied from 2000-2012:
• data deposited for only 1/6…
• available on request from the
original authors in a further 1/6…
• 2/3 of trees only available as figure
panels in the original paper.
• potentially irretrievable loss of the
bulk of the data on which this field
rests.
Source: Drew et al., PLOS Biology 2013, DOI: 10.1371/journal.pbio.1001636
32. Challenges and Opportunities
Science 11 February 2011:
vol. 331 no. 6018 692-693
CREDIT: M. TWOMBLY/SCIENCE;
SOURCE: SCIENCE ONLINE SURVEY
2011 Survey in Science
33. Old to New Policy – March 2014
OLD: Authors should make all relevant data immediately
available without restrictions (excl. patient confidentiality)
NEW: Require authors to make all data underlying the
findings described in their manuscript fully available
without restriction, with rare exception.
Authors must provide a Data Availability Statement (DAS)
describing compliance with PLOS’ policy.
*No change to WHAT data needs to be shared, but the focus
was placed on WHERE it is housed, WHEN it is shared, and
HOW authors provide access for those who want it*
VALUE OF DATA
Replication/Validation;
New analysis;
Better interpretation;
Include in meta-studies;
Facilitate reproducibility;
Scrutiny post- publication;
* $$$* Better return on
research investment.
34. DAS
NB The DAS is openly available, and machine-readable as part of the
PLOS search API
36. New Policy – What Data & How?
The policy applies to the dataset used to reach the conclusions drawn in the manuscript
with related metadata and methods, and any additional data required to replicate the
reported study findings in their entirety. You need not submit your entire dataset, or all raw
data collected during an investigation, but you must provide the portion that is relevant
to the specific study.
DATA-SHARING METHODS
Data deposition (strongly recommended; must include
DOIs or accession numbers/codes).
Data in supporting information files (in a format
from which data can be efficiently extracted e.g. .csv,
excel, NOT .pdf)
EXCEPTIONS
Ethical or legal (e.g. patient privacy). Data access
overseen via ethics oversight committee (Data Access
Committee), anonymized appropriately*
Data from a third party (data must be available from
third party in the same way it was accessed by the
authors.)
* Guidance on confidentiality & anonymity in data about individuals - see BMJ 2010;340:c181; Preparing raw clinical data for
publication: guidance for journal editors, authors, and peer reviewers
37. “… make an honest effort to make the data
accessible and useful to others, and chances are
you’re probably good to go.”
- Matt MacManes, http://genomebio.org/
39. Data Policy – Unknowns
QUESTIONS WE DON’T KNOW ANSWERS TO YET
• How long should people store data? How much data needed to replicate?
• Licensing & attribution;
• Treatment of software/code;
• How should materials sharing differ?
• What to do with big data?
• Do we need better/more aligned consenting for patient studies?
• Best practices for data access committees?
• How to fund data access committees?
• Preservation of obsolete formats?
• How to cite data & credit data reuse?
Very Useful reading: Michael Carroll. PLOS Biology 2015. Sharing Research Data and Intellectual Property Law: A Primer
Also, coming soon: An editorial from the EiCs of PLOS Genetics on the policy & data sharing standards in PLOS Genetics.
40. Compliance? Data on Data @PLOS
Audit on ~11k PLOS articles published post new policy
% papers with full access to data / % restrictions 88% / 12%
% papers with data deposited in repositories 11%
% papers w/ all data in the MS/SI files 66%
% papers w/ data held by 3rd party 2%
% papers with data containing sensitive info 2%
Restrictions by subject area/subfields:
Human & clinical data; fMRI & MRI images; trajectories & simulation
data, human or government data requiring access; fields where data
are too large to submit; geo/satellite; new & emerging
fields/techniques (NTDs)
42. Anecdotes & Interpretation
Source: ‘Confusion over publisher’s pioneering
open-data rules’ Nature 515, 478 (27 November
2014) doi:10.1038/515478a
‘Mandated data archiving greatly improves access
to research data’ T. H Vines et al. Faseb J 27,
1304-1308; Jan 2013
50 fMRI studies in PLOS ONE
38 had shared the data
12 had not shared the data
(completely anecdotal)
An increase in data sharing:
- from 12% to 40%
- even up to as much as 76%
Not seeing full compliance but we are
seeing a MASSIVE improvement
43. Improving & Measuring Reuse?
http://plos.figshare.com/statistics
Analytics on PLOS
Figs, tables, SI files
● usage
○ views
○ downloads
● demographics
○ country
○ Institution
Available by journal or for entire PLOS corpus
44. Publishers in Data Access & Sharing
Lin J, Strasser C (2014) Recommendations for the Role of Publishers in Access to Data.
PLoS Biol 12(10): e1001975. doi:10.1371/journal.pbio.1001975
NSF Funded project:
Make Data Count.
To explore and test
data-level metrics.
Meeting co-organized
by PLOS & California
Digital Library
45. Carrots vs Sticks
Investigating how to:
- Work with Biocurators to assess data & reporting
- Incentivize better reporting standards & scientific practices
- Assess whether we can ‘badge’ good reporting as an incentive
- Improve on Data Citation Standards
- Better recognize Author Contributions
- Implement ORCID id requirement
- Incentivize Openness (for review process, early preprint posting etc.)
- Change an Ingrained Mindset
- Educate the masses…
46. Other Lofty Goals
- Move the World away from Impact Factor as single
metric for evaluation (DORA)
- Individual Article & Data Metrics
- Impact how research assessment is performed
- Experiment with Open Peer Review
- Encouraging Early Online Posting
- Move Towards More Dynamic Publications with
post-publication review and versioning of papers
47. A Changing Currency in Science?
Scientific papers in
physical journals
Scientific papers
online & in print
Online only papers
Online Publication +
data
Data + analysis
(publication?)
O
p
e
n
n
e
s
s
48. A Sea Change
"Tsunami by hokusai 19th century" by Katsushika Hokusai (葛飾北斎) - Metropolitan
Museumphoto of the artwork. Licensed under Public Domain via Commons
ACADEMIC EDITORS:
Making the data available (at least to the editors) at
the time of submission is a logical and reasonable
requirement.
I also think that when the authors of a paper published in
a PLOS journal refuse to provide the data upon request
that the paper should be withdrawn from publication (i.e.,
the journal should retract the paper).
Refusing to allow other scientists to evaluate the
data (particularly if there are concerns regarding the
analyses and conclusions) is hardly in the spirit of
the PLOS open access model. in context
AUTHORS:
“I apologise sincerely for the errors that were included in previous versions of the
manuscript.
This is the first time that I have been required to provide a data spreadsheet
alongside a manuscript, and the value of doing this, which meant that I ended
up double-checking the values in our figures, figure legends and
spreadsheets, has been illustrated amply. In the future (whether required to by
the journal or not!) I will prepare a similar file prior to initial submission in
order to prevent this from ever happening again. We appreciate the precision
with which your staff has edited our manuscript, as its quality has improved because
of it.”
Profound & Notable
Transformation
We want speed, we want early online posting, we want post-pub peer review, we want lots of things, but we also want quality & there is an expectation that very rigourous checks will have been performed. And the world is only too happy to point out errors. But there is a tension here in what is wanted and how much this costs…
Why is data availability important?
In a study by Tim Vines and colleagues of morphological data from plants, animals, or other organisms, 37% of the data from papers published in 2011 still existed, but this fell to 18% for 2001 and 7% for 1991
The odds of receiving the data decreased by about 7% per year.
For papers where they heard about the status of the data, the proportion of authors reporting it lost or on inaccessible hardware rose gradually from 0 of 30 in 2011 to 2 of 9 (22%) in 1997, and then increased to 7 of 8 in 1993 (87%) and 4 of 6 (66%) in 1991.
The end result was that almost all research data was lost 10 to 15 years after publication.
Data are not available for replication, reanalysis, meta-analysis or even if a question arises about the veracity of the research.
Science Magazine conducted a survey on the availability and use of data with 1700 responses with representation across research areas and geographic regions. They published their results in the February 2011 issue. Here, we see that 77% of researchers have asked colleagues for data, while they have had difficulty in getting it 50% of the time. What all of these figures attest to is that there is recognized demand for researchers to access others’ data. In fact, I believe this view undercounts the real need. If there is proven “demand” for data, why aren’t they available? Let’s flip to the other hat that researchers wear and examine the situation as one doing the research (generating data as part of my research project). While there are many underlying reasons, which will take all day to enumerate, this one particular figure highlights a piece of the puzzle in stark way. Here we see that data are not stored, stored in labs, on university servers, in a community repository, or “other.” (I think that’s what you mark when it’s in stuffed in a shoebox at home.)
Aside from community repositories, which account for 7.6% of the data, it’s just not available to others.
We actually didn’t change WHAT data needs to be shared, rather the focus was placed on WHERE it is housed, WHEN it is shared, and HOW authors provide access for those who want it
Development of new data access policy over > 1 year
Steering committee included staff plus journal editors who are also active researchers
Data policy committee involving individuals from, and communicating with, all levels of journal review and production
Presentations at conferences and meetings
Consultations with editorial boards
Review of policy iterations and communications before public release
Pre-implementation questions from researchers
What to do with massive datasets?
What if the researcher plans to publish additional studies using the data?
What if competitors take advantage?
In cases of “data available on request,” what if no data access committee exists and the IRB is not willing to take on the responsibility?
Concerns over privacy for human data
EMPHASIS TEXT ON WHITE
PHOTO WITH HEADLINE 1
Pavel Tomancek uses SPIM microscopy – if he ran it on max capacity for 24 hrs, he would produce 138Tb of data (as compared with CERN who produce a paltry 82 Tb/day)
Another researcher noted that he had 8Tb of data that he condensed to produce a single movie for his talk.
What would we do with this???
Need practical solutions at the Institute level too.
NSF funded project – Make Data Count – to explore and test data-level metrics that capture activity surrounding research data
Meeting co-org by PLOS and California Digital Library
Organizers
● Jennifer Lin, Senior Product Manager, PLOS
● Cameron Neylon, Advocacy Director, PLOS
● Carly Strasser, Data Curation Specialist, California Digital Library
Participants
● Stephen Abrams, Associate Director of UC Curation Center, California Digital Library
● Rachel Bruce, Director, Technology Innovation, Jisc
● Eleni Castro, Research Coordinator, IQSS,Harvard University
● John Chodacki, Director of Product Development, PLOS
● Patricia Cruse, Director of UC Curation Center, California Digital Library
● Ingrid Dillo, Head Policy Communication Development, DANS
● Alex Garnett, Data Curation & Digital Preservation Specialist, Simon Fraser University
● Jennifer Green, Director of Research Data Services, University of Michigan
● Simon Hodson, Executive Director, CODATA
● Eric Kansa, Technology Director, Open Context
● Belinda Norman, Research Data Manager, University of Sydney
● Mark Parsons, Secretary General, Research Data Alliance
● Jonathan Tedds, Senior Research Fellow, University of Leicester
● Todd Vision, Principal Investigator, Dryad; Associate Director for Informatics, National Evolutionary Synthesis Center