SlideShare una empresa de Scribd logo
1 de 13
Reproducibility of Published
Scientific and Medical Findings in
Top Journals in an Era of Big Data
by
Shannon Bohle, BA, MLIS, CDS (Cantab), FRAS, AHIP
.org 2014 Tech Conference
The Honourable Robert Boyle 1627–1691, Experimental Philosopher
(Image credit: Wellcome Library, London). Published with written permission.
First published in 1661, The Sceptical Chymist: or Chymico-Physical Doubts &
Paradoxes, Touching the Spagyrist's Principles Commonly call'd Hypostatical; As
they are wont to be Propos'd and Defended by the Generality of Alchymists.
Whereunto is præmis'd Part of another Discourse relating to the same
Subject was written by Robert Boyle and is the source of the name of the
modern field of 'chemistry.' Image credit: Project Gutenberg.
Yunda Huang and Raphael Gottardo. Comparability and reproducibility of biomedical data. Brief Bioinform. Jul 2013; 14(4): 391–401.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3713713/#!po=71.0526. doi: 10.1093/bib/bbs078.
“Stodden’s talk reminded us that the idea of open data is not a new one; indeed, when studying the
history and philosophy of science, Robert Boyle is credited with stressing the concepts of skepticism,
transparency, and reproducibility for independent verification in scholarly publishing in the 1660s.
The scientific method
later was divided into
two major branches,
deductive and empirical
approaches, she noted.
Today, a theoretical
revision in the scientific
method should include
a new branch, Stodden
advocated, that of the
computational
approach, where like
the other two methods,
all of the computational
Source: http://www.scilogs.com/scientific_and_medical_libraries/what-is-e-science-and-how-should-it-be-managed
steps by which scientists draw conclusions are revealed. his is because within the last 20 years,
people have been grappling with how to handle changes in high performance computing and
simulation. What is often referred to as “big data” has revolutionized science. Some examples of
this include the Large Hadron Collider (LHC) at CERN which generates around 780 terabytes per
year, the Sloan Digital Sky Survey that recently released 60 terabytes, and computational biology,
bioinformatics, and genomics which are also highly data intensive modern fields of science,
she concluded.”
8 Different Standards for Rating the Quality of Open Data
“When working with government data it may be helpful to keep a few key guidelines in mind. The problem is, there are many
guidelines. A working group within OpenGovData.org developed ‘8 Principles of Open Government Data’ which are: ‘1. Data Must Be
Complete... 2. Data Must Be Primary... 3. Data Must Be Timely... 4. Data Must Be Accessible... 5. Data Must Be Machine processable...
6. Access Must Be Non-Discriminatory... 7. Data Formats Must Be Non-Proprietary... 8. Data Must Be License-free.’ This is very similar
to the Sunlight Foundation's ‘Ten Principles for Opening Up Government Information’— ‘1. Completeness... 2. Primacy... 3.
Timeliness 4. Ease of Physical and Electronic Access... 5. Machine readability... 6. Non-discrimination... 7. Use of Commonly
Owned Standards...8. Licensing... 9. Permanence... 10. Usage Costs.’ Open government data initiatives could also be held up to a
5-star rating method, which has been proposed by Tim Berners-Lee, the British computer scientist credited with inventing the
World Wide Web:
★ Available on the web (whatever format), but with an open licence to be Open Data
★★ Available as machine-readable structured data (e.g. Excel instead of image scan of a table)
★★★ As (2), plus non-proprietary format (e.g. CSV instead of Excel)
★★★★ All the above, plus use W3C open standards (RDF and SPARQL)
★★★★★ All the above, plus link your data to other people’s data to provide context
The Open Data Institute has created an ‘Open Data Certificate’ for data and rates it against a checklist. Certificates awarded grade
data as ‘Raw: A great start at the basics of publishing open data, Pilot: Data users receive extra support from, and can provide
feedback to the publisher, Standard: Regularly published open data with robust support that people can rely on, and Expert: An
exceptional example of information infrastructure.’ In 2009, The White House created a scorecard by which open data can be
evaluated according to 10 criteria: ‘high value data, data integrity, open webpage, public consultation, overall plan, formulating the
plan, transparency, participation, collaboration, and flagship initiative.’ The U.S. government's simple stoplight-like rating system was
as follows: green for data that ‘meets expectations,’ yellow for data that demonstrates ‘progress toward expectations,’ and red for
data that ‘fails to meet expectations.’ At the other end of the spectrum, there is an exceptionally complex checklist offered by
OPQUAST. On May 9, 2013, President Obama issued an Executive Order ‘Making Open and Machine Readable the New Default for
Government Information’ wherein a new ‘Open Data Policy’ has just been established and being newly implemented through
‘Project Open Data’ in which there are seven key principles: ‘public, accessible, described, reusable, complete, timely, and managed
post-release.’ There does not seem to be an associated rating system, however, to evaluate how well the data complies with the
principles. Finally, Nature has set up three criteria for data: firstly, ‘’experimental rigor and technical data quality,’ secondly,
‘completeness of the description,’ and lastly, ‘integrity of the data files and repository record..’”
(Source: http://www.scilogs.com/scientific_and_medical_libraries/open-data-tools-turning-data-into-actionable-intelligence/#comment-493809)
EVALUATING P VALUES, R², AND SAMPLE SIZES
IN TOP-TIER SCIENCE JOURNALS
Access to raw or curated data sets that accompanies published articles adds to
greater transparency, and it may catch obvious errors or fraudulence. However,
it will only show the studies were done to the best of the ability given a small
sample size.
In addition to replication of experiments and reproduction based on data sets
and methodology, as we move into an era of “Big Data,” from small sample
sizes to very large ones, the arguments in previously published articles BOTH
in top-tier and lower-tier journals will probably be disproved while others will
be proved nearly conclusively.
Fraud and “gaming” of p value (predictability value) data, a mathematical
description, sample size , and “cherry picking” of successful results are key
ways that fraud happens. If 100 mice were tested and only 10 showed the results
that were expected, for example, and only the 10 were reported on. Big data
makes it less likely that this cherry picking will work. Is there a way to test for
gaming of the data?Yes there are at least two ways: 1) Examine lab notebooks to
see how many experiments were done correctly but “thrown out” because the
results did not fit the hypothesis, and 2) “Big Data” opens the door for much
greater levels of certainty—that is, predictable, reproducible p values.
Figure 1. Breakdown of journal policies for public deposition of certain data types, sharing of materials and/or
protocols, and whether this is a condition for publication and percentage of papers with fully deposited data.
Alsheikh-Ali AA, Qureshi W, Al-Mallah MH, Ioannidis JPA (2011) Public Availability of Published Research Data in High-
Impact Journals. PLoS ONE 6(9): e24357. doi:10.1371/journal.pone.0024357.
http://www.plosone.org/article/info:doi/10.1371/journal.pone.0024357.
Table 1. Economic Terms and Analogies in Scientific Publication
Young NS, Ioannidis JPA, Al-Ubaydli O (2008) Why Current Publication Practices May
Distort Science. PLoS Med 5(10): e201. doi:10.1371/journal.pmed.0050201
http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.0050201
Regular link: http://www.nature.com/nature/journal/v506/n7487/full/506159e.html
Free, full-text: http://www.readcube.com/articles/10.1038/506159e
Petition 1
Petition 2
http://www.causes.com/campaigns/73992-improve-reproducibility-through-open-notebook-science
Selected Bibliography
King G (1995) Replication, replication. Political Science and Politics 28: 443–499. http://gking.harvard.edu/files/abs/
replication-abs.shtml (last accessed May 8, 2013).
Mesirov J (2010) Accessible reproducible research. Science 327: 964.http://www.sciencemag.org/content/327/
5964/415 (last accessed May 8, 2013).
Alsheikh-Ali A, Qureshi W, Al-Mallah M, Ioannidis JPA (2011) Public availability of published research data in high-impact
journals. PLoS ONE 6: 9.http://www.plosone.org/article/info:doi%2F10.1371%2Fjournal.pone.0024357 (last accessed
May 8, 2013). [PMC free article] [PubMed]
Victoria Stodden, Peixuan Guo, and Zhaokun Ma, Toward Reproducible Computational Research: An Empirical Analysis of
Data and Code Policy Adoption by Journals. PLoS One. 2013; 8(6): e67111. Published online Jun 21, 2013.
doi: 10.1371/journal.pone.0067111. PMCID: PMC3689732.
Reproducible Research. Special Issue, Computing in Science and Engineering 14: 4 11–56. http://ieeexplore.ieee.org/xpl/
tocresult.jsp?reload=true&isnumber=6241356&punumber=5992(last accessed May 8, 2013).
Stodden V, Mitchell I, LeVeque R (2012) Reproducible research for scientific computing: Tools and strategies for changing
the culture. Computing in Science and Engineering 14: 4 13–17.http://www.computer.org/csdl/mags/cs/2012/04/
mcs2012040013-abs.html(last accessed May 8, 2013).
Young NS, Ioannidis JP, Al-Ubaydli O (2008) Why current publication practices may distort science. PLoS Med 5: e201.
Increasing value and reducing waste in research design, conduct, and analysis John P A Ioannidis, Sander Greenland,
Mark A Hlatky, Muin J Khoury, Malcolm R Macleod, David Moher, Kenneth F Schulz, Robert Tibshirani The Lancet 11
January 2014 (Volume 383 Issue 9912 Pages 166-175 DOI: 10.1016/S0140-6736(13)62227-8)
John Wood, Nck Freemantle,Michael King,Irwin Nazareth. Trap of trends to statistical significance: likelihood of near
significant P value becoming more significant with extra data. BMJ 2014; 348 doi:
http://dx.doi.org/10.1136/bmj.g2215 (Published 31 March 2014)Cite this as: BMJ 2014;348:g2215.

Más contenido relacionado

La actualidad más candente

Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an O...
Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an O...Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an O...
Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an O...GigaScience, BGI Hong Kong
 
Reproducibility, open access, open science
Reproducibility, open access, open scienceReproducibility, open access, open science
Reproducibility, open access, open scienceAlex Holcombe
 
Nature Drug Discovery Dec 2013
Nature Drug Discovery Dec 2013Nature Drug Discovery Dec 2013
Nature Drug Discovery Dec 2013charles10000
 
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...GigaScience, BGI Hong Kong
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Carole Goble
 
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksResults Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksCarole Goble
 
Bishop reproducibility references nov2016
Bishop reproducibility references nov2016Bishop reproducibility references nov2016
Bishop reproducibility references nov2016Dorothy Bishop
 
References on Reproducibility Crisis in Science by D.V.M. Bishop
References on Reproducibility Crisis in Science by D.V.M. BishopReferences on Reproducibility Crisis in Science by D.V.M. Bishop
References on Reproducibility Crisis in Science by D.V.M. BishopDorothy Bishop
 
Scott Edmunds ICIS talk at UC Davis: Open Publishing for the Big Data era
Scott Edmunds ICIS talk at UC Davis: Open Publishing for the Big Data eraScott Edmunds ICIS talk at UC Davis: Open Publishing for the Big Data era
Scott Edmunds ICIS talk at UC Davis: Open Publishing for the Big Data eraGigaScience, BGI Hong Kong
 
Laurie Goodman at #CSE2014: Reproducibility: It's going to cost you time and ...
Laurie Goodman at #CSE2014: Reproducibility: It's going to cost you time and ...Laurie Goodman at #CSE2014: Reproducibility: It's going to cost you time and ...
Laurie Goodman at #CSE2014: Reproducibility: It's going to cost you time and ...GigaScience, BGI Hong Kong
 
PLoS ONE Piwowar: Sharing Detailed Research Data Is Associated with Increa...
PLoS ONE Piwowar:    Sharing Detailed Research Data Is Associated with Increa...PLoS ONE Piwowar:    Sharing Detailed Research Data Is Associated with Increa...
PLoS ONE Piwowar: Sharing Detailed Research Data Is Associated with Increa...Heather Piwowar
 
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...GigaScience, BGI Hong Kong
 
Executing the Research Paper
Executing the Research PaperExecuting the Research Paper
Executing the Research PaperAnita de Waard
 
How to Execute A Research Paper
How to Execute A Research PaperHow to Execute A Research Paper
How to Execute A Research PaperAnita de Waard
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...GigaScience, BGI Hong Kong
 
Scholarly Communication for Bioinformatics Students
Scholarly Communication for Bioinformatics StudentsScholarly Communication for Bioinformatics Students
Scholarly Communication for Bioinformatics StudentsPhilip Bourne
 

La actualidad más candente (20)

Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an O...
Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an O...Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an O...
Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an O...
 
When pharmaceutical companies publish large datasets an abundance of riches o...
When pharmaceutical companies publish large datasets an abundance of riches o...When pharmaceutical companies publish large datasets an abundance of riches o...
When pharmaceutical companies publish large datasets an abundance of riches o...
 
Reproducibility, open access, open science
Reproducibility, open access, open scienceReproducibility, open access, open science
Reproducibility, open access, open science
 
Nature Drug Discovery Dec 2013
Nature Drug Discovery Dec 2013Nature Drug Discovery Dec 2013
Nature Drug Discovery Dec 2013
 
Cartegena051811
Cartegena051811Cartegena051811
Cartegena051811
 
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how
 
Science is built on trust.
Science is built on trust.Science is built on trust.
Science is built on trust.
 
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksResults Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
 
Bishop reproducibility references nov2016
Bishop reproducibility references nov2016Bishop reproducibility references nov2016
Bishop reproducibility references nov2016
 
References on Reproducibility Crisis in Science by D.V.M. Bishop
References on Reproducibility Crisis in Science by D.V.M. BishopReferences on Reproducibility Crisis in Science by D.V.M. Bishop
References on Reproducibility Crisis in Science by D.V.M. Bishop
 
Scott Edmunds ICIS talk at UC Davis: Open Publishing for the Big Data era
Scott Edmunds ICIS talk at UC Davis: Open Publishing for the Big Data eraScott Edmunds ICIS talk at UC Davis: Open Publishing for the Big Data era
Scott Edmunds ICIS talk at UC Davis: Open Publishing for the Big Data era
 
Laurie Goodman at #CSE2014: Reproducibility: It's going to cost you time and ...
Laurie Goodman at #CSE2014: Reproducibility: It's going to cost you time and ...Laurie Goodman at #CSE2014: Reproducibility: It's going to cost you time and ...
Laurie Goodman at #CSE2014: Reproducibility: It's going to cost you time and ...
 
PLoS ONE Piwowar: Sharing Detailed Research Data Is Associated with Increa...
PLoS ONE Piwowar:    Sharing Detailed Research Data Is Associated with Increa...PLoS ONE Piwowar:    Sharing Detailed Research Data Is Associated with Increa...
PLoS ONE Piwowar: Sharing Detailed Research Data Is Associated with Increa...
 
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
 
Executing the Research Paper
Executing the Research PaperExecuting the Research Paper
Executing the Research Paper
 
How to Execute A Research Paper
How to Execute A Research PaperHow to Execute A Research Paper
How to Execute A Research Paper
 
Basic research
Basic researchBasic research
Basic research
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
 
Scholarly Communication for Bioinformatics Students
Scholarly Communication for Bioinformatics StudentsScholarly Communication for Bioinformatics Students
Scholarly Communication for Bioinformatics Students
 

Destacado

The NACA at Lewis Laboratory, a Legacy of Ohioans Solving the Problem of Flight
The NACA at Lewis Laboratory, a Legacy of Ohioans Solving the Problem of FlightThe NACA at Lewis Laboratory, a Legacy of Ohioans Solving the Problem of Flight
The NACA at Lewis Laboratory, a Legacy of Ohioans Solving the Problem of Flight01archivist
 
Parlem d'immigració
Parlem d'immigracióParlem d'immigració
Parlem d'immigracióACI
 
Creating a 21st Century Science Library: How and Why
Creating a 21st Century Science Library: How and WhyCreating a 21st Century Science Library: How and Why
Creating a 21st Century Science Library: How and Why01archivist
 
Biological R/evolutions
Biological R/evolutionsBiological R/evolutions
Biological R/evolutions01archivist
 
Htng property webservices_technical_specification_2009b_final
Htng property webservices_technical_specification_2009b_finalHtng property webservices_technical_specification_2009b_final
Htng property webservices_technical_specification_2009b_finalLodewijk Abrahams
 
Intl strategic mgmt
Intl strategic mgmtIntl strategic mgmt
Intl strategic mgmtAditya Gaur
 
Exploring your personal genome with free, online bioinformatics tools
Exploring your personal genome with free, online bioinformatics toolsExploring your personal genome with free, online bioinformatics tools
Exploring your personal genome with free, online bioinformatics tools01archivist
 
1ero y 2do extraclases 3er periodo
1ero y 2do  extraclases 3er periodo1ero y 2do  extraclases 3er periodo
1ero y 2do extraclases 3er periodoArturo Blanco
 

Destacado (8)

The NACA at Lewis Laboratory, a Legacy of Ohioans Solving the Problem of Flight
The NACA at Lewis Laboratory, a Legacy of Ohioans Solving the Problem of FlightThe NACA at Lewis Laboratory, a Legacy of Ohioans Solving the Problem of Flight
The NACA at Lewis Laboratory, a Legacy of Ohioans Solving the Problem of Flight
 
Parlem d'immigració
Parlem d'immigracióParlem d'immigració
Parlem d'immigració
 
Creating a 21st Century Science Library: How and Why
Creating a 21st Century Science Library: How and WhyCreating a 21st Century Science Library: How and Why
Creating a 21st Century Science Library: How and Why
 
Biological R/evolutions
Biological R/evolutionsBiological R/evolutions
Biological R/evolutions
 
Htng property webservices_technical_specification_2009b_final
Htng property webservices_technical_specification_2009b_finalHtng property webservices_technical_specification_2009b_final
Htng property webservices_technical_specification_2009b_final
 
Intl strategic mgmt
Intl strategic mgmtIntl strategic mgmt
Intl strategic mgmt
 
Exploring your personal genome with free, online bioinformatics tools
Exploring your personal genome with free, online bioinformatics toolsExploring your personal genome with free, online bioinformatics tools
Exploring your personal genome with free, online bioinformatics tools
 
1ero y 2do extraclases 3er periodo
1ero y 2do  extraclases 3er periodo1ero y 2do  extraclases 3er periodo
1ero y 2do extraclases 3er periodo
 

Similar a Reproducibility

Reproducibility, argument and data in translational medicine
Reproducibility, argument and data in translational medicineReproducibility, argument and data in translational medicine
Reproducibility, argument and data in translational medicineTim Clark
 
Nature Drug Discovery Dec 2013
Nature Drug Discovery Dec 2013Nature Drug Discovery Dec 2013
Nature Drug Discovery Dec 2013charles10000
 
Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?LEARN Project
 
NGP Retreat Open Science 2015
NGP Retreat Open Science 2015NGP Retreat Open Science 2015
NGP Retreat Open Science 2015Jackie Wirz, PhD
 
The State of Open Research Data - OpenCon 2014
The State of Open Research Data - OpenCon 2014The State of Open Research Data - OpenCon 2014
The State of Open Research Data - OpenCon 2014Right to Research
 
The State of Open Research Data
The State of Open Research DataThe State of Open Research Data
The State of Open Research DataRoss Mounce
 
Open Data and the Social Sciences - OpenCon Community Webcast
Open Data and the Social Sciences - OpenCon Community WebcastOpen Data and the Social Sciences - OpenCon Community Webcast
Open Data and the Social Sciences - OpenCon Community WebcastRight to Research
 
The OpenCon Intro to Open Data
The OpenCon Intro to Open DataThe OpenCon Intro to Open Data
The OpenCon Intro to Open DataRoss Mounce
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...Carole Goble
 
Visualization Tools for the Refinery Platform - Supporting reproducible resea...
Visualization Tools for the Refinery Platform - Supporting reproducible resea...Visualization Tools for the Refinery Platform - Supporting reproducible resea...
Visualization Tools for the Refinery Platform - Supporting reproducible resea...Nils Gehlenborg
 
Minimal viable data reuse
Minimal viable data reuseMinimal viable data reuse
Minimal viable data reusevoginip
 
Open Data and Open Science
Open Data and Open ScienceOpen Data and Open Science
Open Data and Open ScienceTheContentMine
 
Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...
Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...
Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...Scott Edmunds
 
What is the future of scientific communication? Open Science (Claude Pirmez)
What is the future of scientific communication? Open Science (Claude Pirmez)What is the future of scientific communication? Open Science (Claude Pirmez)
What is the future of scientific communication? Open Science (Claude Pirmez)http://bvsalud.org/
 
2014 CrossRef Annual Meeting Keynote: Ways and Needs to Promote Rapid Data Sh...
2014 CrossRef Annual Meeting Keynote: Ways and Needs to Promote Rapid Data Sh...2014 CrossRef Annual Meeting Keynote: Ways and Needs to Promote Rapid Data Sh...
2014 CrossRef Annual Meeting Keynote: Ways and Needs to Promote Rapid Data Sh...Crossref
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8Scott Edmunds
 

Similar a Reproducibility (20)

Reproducibility, argument and data in translational medicine
Reproducibility, argument and data in translational medicineReproducibility, argument and data in translational medicine
Reproducibility, argument and data in translational medicine
 
Nature Drug Discovery Dec 2013
Nature Drug Discovery Dec 2013Nature Drug Discovery Dec 2013
Nature Drug Discovery Dec 2013
 
Reaching out to collaborators and crowdsourcing for pharmaceutical research
Reaching out to collaborators and crowdsourcing for pharmaceutical research  Reaching out to collaborators and crowdsourcing for pharmaceutical research
Reaching out to collaborators and crowdsourcing for pharmaceutical research
 
تحليل البيانات وتفسير المعطيات
تحليل البيانات وتفسير المعطياتتحليل البيانات وتفسير المعطيات
تحليل البيانات وتفسير المعطيات
 
Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?
 
The new alchemy: Online networking, data sharing and research activity distri...
The new alchemy: Online networking, data sharing and research activity distri...The new alchemy: Online networking, data sharing and research activity distri...
The new alchemy: Online networking, data sharing and research activity distri...
 
The State of Open Data Report by @figshare
The State of Open Data Report  by @figshareThe State of Open Data Report  by @figshare
The State of Open Data Report by @figshare
 
NGP Retreat Open Science 2015
NGP Retreat Open Science 2015NGP Retreat Open Science 2015
NGP Retreat Open Science 2015
 
The State of Open Research Data - OpenCon 2014
The State of Open Research Data - OpenCon 2014The State of Open Research Data - OpenCon 2014
The State of Open Research Data - OpenCon 2014
 
The State of Open Research Data
The State of Open Research DataThe State of Open Research Data
The State of Open Research Data
 
Open Data and the Social Sciences - OpenCon Community Webcast
Open Data and the Social Sciences - OpenCon Community WebcastOpen Data and the Social Sciences - OpenCon Community Webcast
Open Data and the Social Sciences - OpenCon Community Webcast
 
The OpenCon Intro to Open Data
The OpenCon Intro to Open DataThe OpenCon Intro to Open Data
The OpenCon Intro to Open Data
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
 
Visualization Tools for the Refinery Platform - Supporting reproducible resea...
Visualization Tools for the Refinery Platform - Supporting reproducible resea...Visualization Tools for the Refinery Platform - Supporting reproducible resea...
Visualization Tools for the Refinery Platform - Supporting reproducible resea...
 
Minimal viable data reuse
Minimal viable data reuseMinimal viable data reuse
Minimal viable data reuse
 
Open Data and Open Science
Open Data and Open ScienceOpen Data and Open Science
Open Data and Open Science
 
Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...
Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...
Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...
 
What is the future of scientific communication? Open Science (Claude Pirmez)
What is the future of scientific communication? Open Science (Claude Pirmez)What is the future of scientific communication? Open Science (Claude Pirmez)
What is the future of scientific communication? Open Science (Claude Pirmez)
 
2014 CrossRef Annual Meeting Keynote: Ways and Needs to Promote Rapid Data Sh...
2014 CrossRef Annual Meeting Keynote: Ways and Needs to Promote Rapid Data Sh...2014 CrossRef Annual Meeting Keynote: Ways and Needs to Promote Rapid Data Sh...
2014 CrossRef Annual Meeting Keynote: Ways and Needs to Promote Rapid Data Sh...
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 

Más de 01archivist

Taxonomy and the Conservation of Endangered Species
Taxonomy and the Conservation of Endangered SpeciesTaxonomy and the Conservation of Endangered Species
Taxonomy and the Conservation of Endangered Species01archivist
 
Communicating Science Accurately Through Entertainment
Communicating Science Accurately Through EntertainmentCommunicating Science Accurately Through Entertainment
Communicating Science Accurately Through Entertainment01archivist
 
Second Digital Generation
Second Digital GenerationSecond Digital Generation
Second Digital Generation01archivist
 
Open data tools -turning data into actionable intelligence
Open data tools -turning data into actionable intelligenceOpen data tools -turning data into actionable intelligence
Open data tools -turning data into actionable intelligence01archivist
 
Virtual worlds as portals for information discovery
Virtual worlds as portals for information discoveryVirtual worlds as portals for information discovery
Virtual worlds as portals for information discovery01archivist
 
Penn State's Educational Robotics Projects and Exhibits
Penn State's Educational Robotics Projects and ExhibitsPenn State's Educational Robotics Projects and Exhibits
Penn State's Educational Robotics Projects and Exhibits01archivist
 
Basic AIML Class
Basic AIML ClassBasic AIML Class
Basic AIML Class01archivist
 
Machinima Best Practices
Machinima Best PracticesMachinima Best Practices
Machinima Best Practices01archivist
 
Barbara McClintock
Barbara McClintockBarbara McClintock
Barbara McClintock01archivist
 
DEFENSE TECHNICAL INFORMATION CENTER (DTIC)
DEFENSE TECHNICAL INFORMATION CENTER (DTIC)DEFENSE TECHNICAL INFORMATION CENTER (DTIC)
DEFENSE TECHNICAL INFORMATION CENTER (DTIC)01archivist
 
SciLands Best Practices in Education Panel and Discussion
SciLands Best Practices in Education Panel and DiscussionSciLands Best Practices in Education Panel and Discussion
SciLands Best Practices in Education Panel and Discussion01archivist
 
Archives In Second Life
Archives In Second LifeArchives In Second Life
Archives In Second Life01archivist
 

Más de 01archivist (13)

Taxonomy and the Conservation of Endangered Species
Taxonomy and the Conservation of Endangered SpeciesTaxonomy and the Conservation of Endangered Species
Taxonomy and the Conservation of Endangered Species
 
Communicating Science Accurately Through Entertainment
Communicating Science Accurately Through EntertainmentCommunicating Science Accurately Through Entertainment
Communicating Science Accurately Through Entertainment
 
Second Digital Generation
Second Digital GenerationSecond Digital Generation
Second Digital Generation
 
Open data tools -turning data into actionable intelligence
Open data tools -turning data into actionable intelligenceOpen data tools -turning data into actionable intelligence
Open data tools -turning data into actionable intelligence
 
Virtual worlds as portals for information discovery
Virtual worlds as portals for information discoveryVirtual worlds as portals for information discovery
Virtual worlds as portals for information discovery
 
Penn State's Educational Robotics Projects and Exhibits
Penn State's Educational Robotics Projects and ExhibitsPenn State's Educational Robotics Projects and Exhibits
Penn State's Educational Robotics Projects and Exhibits
 
Basic AIML Class
Basic AIML ClassBasic AIML Class
Basic AIML Class
 
Machinima Best Practices
Machinima Best PracticesMachinima Best Practices
Machinima Best Practices
 
Barbara McClintock
Barbara McClintockBarbara McClintock
Barbara McClintock
 
DEFENSE TECHNICAL INFORMATION CENTER (DTIC)
DEFENSE TECHNICAL INFORMATION CENTER (DTIC)DEFENSE TECHNICAL INFORMATION CENTER (DTIC)
DEFENSE TECHNICAL INFORMATION CENTER (DTIC)
 
SciLands Best Practices in Education Panel and Discussion
SciLands Best Practices in Education Panel and DiscussionSciLands Best Practices in Education Panel and Discussion
SciLands Best Practices in Education Panel and Discussion
 
Archives In Second Life
Archives In Second LifeArchives In Second Life
Archives In Second Life
 
Archivopedia
ArchivopediaArchivopedia
Archivopedia
 

Último

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Reproducibility

  • 1. Reproducibility of Published Scientific and Medical Findings in Top Journals in an Era of Big Data by Shannon Bohle, BA, MLIS, CDS (Cantab), FRAS, AHIP .org 2014 Tech Conference
  • 2. The Honourable Robert Boyle 1627–1691, Experimental Philosopher (Image credit: Wellcome Library, London). Published with written permission. First published in 1661, The Sceptical Chymist: or Chymico-Physical Doubts & Paradoxes, Touching the Spagyrist's Principles Commonly call'd Hypostatical; As they are wont to be Propos'd and Defended by the Generality of Alchymists. Whereunto is præmis'd Part of another Discourse relating to the same Subject was written by Robert Boyle and is the source of the name of the modern field of 'chemistry.' Image credit: Project Gutenberg.
  • 3. Yunda Huang and Raphael Gottardo. Comparability and reproducibility of biomedical data. Brief Bioinform. Jul 2013; 14(4): 391–401. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3713713/#!po=71.0526. doi: 10.1093/bib/bbs078.
  • 4.
  • 5. “Stodden’s talk reminded us that the idea of open data is not a new one; indeed, when studying the history and philosophy of science, Robert Boyle is credited with stressing the concepts of skepticism, transparency, and reproducibility for independent verification in scholarly publishing in the 1660s. The scientific method later was divided into two major branches, deductive and empirical approaches, she noted. Today, a theoretical revision in the scientific method should include a new branch, Stodden advocated, that of the computational approach, where like the other two methods, all of the computational Source: http://www.scilogs.com/scientific_and_medical_libraries/what-is-e-science-and-how-should-it-be-managed steps by which scientists draw conclusions are revealed. his is because within the last 20 years, people have been grappling with how to handle changes in high performance computing and simulation. What is often referred to as “big data” has revolutionized science. Some examples of this include the Large Hadron Collider (LHC) at CERN which generates around 780 terabytes per year, the Sloan Digital Sky Survey that recently released 60 terabytes, and computational biology, bioinformatics, and genomics which are also highly data intensive modern fields of science, she concluded.”
  • 6. 8 Different Standards for Rating the Quality of Open Data “When working with government data it may be helpful to keep a few key guidelines in mind. The problem is, there are many guidelines. A working group within OpenGovData.org developed ‘8 Principles of Open Government Data’ which are: ‘1. Data Must Be Complete... 2. Data Must Be Primary... 3. Data Must Be Timely... 4. Data Must Be Accessible... 5. Data Must Be Machine processable... 6. Access Must Be Non-Discriminatory... 7. Data Formats Must Be Non-Proprietary... 8. Data Must Be License-free.’ This is very similar to the Sunlight Foundation's ‘Ten Principles for Opening Up Government Information’— ‘1. Completeness... 2. Primacy... 3. Timeliness 4. Ease of Physical and Electronic Access... 5. Machine readability... 6. Non-discrimination... 7. Use of Commonly Owned Standards...8. Licensing... 9. Permanence... 10. Usage Costs.’ Open government data initiatives could also be held up to a 5-star rating method, which has been proposed by Tim Berners-Lee, the British computer scientist credited with inventing the World Wide Web: ★ Available on the web (whatever format), but with an open licence to be Open Data ★★ Available as machine-readable structured data (e.g. Excel instead of image scan of a table) ★★★ As (2), plus non-proprietary format (e.g. CSV instead of Excel) ★★★★ All the above, plus use W3C open standards (RDF and SPARQL) ★★★★★ All the above, plus link your data to other people’s data to provide context The Open Data Institute has created an ‘Open Data Certificate’ for data and rates it against a checklist. Certificates awarded grade data as ‘Raw: A great start at the basics of publishing open data, Pilot: Data users receive extra support from, and can provide feedback to the publisher, Standard: Regularly published open data with robust support that people can rely on, and Expert: An exceptional example of information infrastructure.’ In 2009, The White House created a scorecard by which open data can be evaluated according to 10 criteria: ‘high value data, data integrity, open webpage, public consultation, overall plan, formulating the plan, transparency, participation, collaboration, and flagship initiative.’ The U.S. government's simple stoplight-like rating system was as follows: green for data that ‘meets expectations,’ yellow for data that demonstrates ‘progress toward expectations,’ and red for data that ‘fails to meet expectations.’ At the other end of the spectrum, there is an exceptionally complex checklist offered by OPQUAST. On May 9, 2013, President Obama issued an Executive Order ‘Making Open and Machine Readable the New Default for Government Information’ wherein a new ‘Open Data Policy’ has just been established and being newly implemented through ‘Project Open Data’ in which there are seven key principles: ‘public, accessible, described, reusable, complete, timely, and managed post-release.’ There does not seem to be an associated rating system, however, to evaluate how well the data complies with the principles. Finally, Nature has set up three criteria for data: firstly, ‘’experimental rigor and technical data quality,’ secondly, ‘completeness of the description,’ and lastly, ‘integrity of the data files and repository record..’” (Source: http://www.scilogs.com/scientific_and_medical_libraries/open-data-tools-turning-data-into-actionable-intelligence/#comment-493809)
  • 7. EVALUATING P VALUES, R², AND SAMPLE SIZES IN TOP-TIER SCIENCE JOURNALS Access to raw or curated data sets that accompanies published articles adds to greater transparency, and it may catch obvious errors or fraudulence. However, it will only show the studies were done to the best of the ability given a small sample size. In addition to replication of experiments and reproduction based on data sets and methodology, as we move into an era of “Big Data,” from small sample sizes to very large ones, the arguments in previously published articles BOTH in top-tier and lower-tier journals will probably be disproved while others will be proved nearly conclusively. Fraud and “gaming” of p value (predictability value) data, a mathematical description, sample size , and “cherry picking” of successful results are key ways that fraud happens. If 100 mice were tested and only 10 showed the results that were expected, for example, and only the 10 were reported on. Big data makes it less likely that this cherry picking will work. Is there a way to test for gaming of the data?Yes there are at least two ways: 1) Examine lab notebooks to see how many experiments were done correctly but “thrown out” because the results did not fit the hypothesis, and 2) “Big Data” opens the door for much greater levels of certainty—that is, predictable, reproducible p values.
  • 8. Figure 1. Breakdown of journal policies for public deposition of certain data types, sharing of materials and/or protocols, and whether this is a condition for publication and percentage of papers with fully deposited data. Alsheikh-Ali AA, Qureshi W, Al-Mallah MH, Ioannidis JPA (2011) Public Availability of Published Research Data in High- Impact Journals. PLoS ONE 6(9): e24357. doi:10.1371/journal.pone.0024357. http://www.plosone.org/article/info:doi/10.1371/journal.pone.0024357.
  • 9. Table 1. Economic Terms and Analogies in Scientific Publication Young NS, Ioannidis JPA, Al-Ubaydli O (2008) Why Current Publication Practices May Distort Science. PLoS Med 5(10): e201. doi:10.1371/journal.pmed.0050201 http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.0050201
  • 10. Regular link: http://www.nature.com/nature/journal/v506/n7487/full/506159e.html Free, full-text: http://www.readcube.com/articles/10.1038/506159e
  • 13. Selected Bibliography King G (1995) Replication, replication. Political Science and Politics 28: 443–499. http://gking.harvard.edu/files/abs/ replication-abs.shtml (last accessed May 8, 2013). Mesirov J (2010) Accessible reproducible research. Science 327: 964.http://www.sciencemag.org/content/327/ 5964/415 (last accessed May 8, 2013). Alsheikh-Ali A, Qureshi W, Al-Mallah M, Ioannidis JPA (2011) Public availability of published research data in high-impact journals. PLoS ONE 6: 9.http://www.plosone.org/article/info:doi%2F10.1371%2Fjournal.pone.0024357 (last accessed May 8, 2013). [PMC free article] [PubMed] Victoria Stodden, Peixuan Guo, and Zhaokun Ma, Toward Reproducible Computational Research: An Empirical Analysis of Data and Code Policy Adoption by Journals. PLoS One. 2013; 8(6): e67111. Published online Jun 21, 2013. doi: 10.1371/journal.pone.0067111. PMCID: PMC3689732. Reproducible Research. Special Issue, Computing in Science and Engineering 14: 4 11–56. http://ieeexplore.ieee.org/xpl/ tocresult.jsp?reload=true&isnumber=6241356&punumber=5992(last accessed May 8, 2013). Stodden V, Mitchell I, LeVeque R (2012) Reproducible research for scientific computing: Tools and strategies for changing the culture. Computing in Science and Engineering 14: 4 13–17.http://www.computer.org/csdl/mags/cs/2012/04/ mcs2012040013-abs.html(last accessed May 8, 2013). Young NS, Ioannidis JP, Al-Ubaydli O (2008) Why current publication practices may distort science. PLoS Med 5: e201. Increasing value and reducing waste in research design, conduct, and analysis John P A Ioannidis, Sander Greenland, Mark A Hlatky, Muin J Khoury, Malcolm R Macleod, David Moher, Kenneth F Schulz, Robert Tibshirani The Lancet 11 January 2014 (Volume 383 Issue 9912 Pages 166-175 DOI: 10.1016/S0140-6736(13)62227-8) John Wood, Nck Freemantle,Michael King,Irwin Nazareth. Trap of trends to statistical significance: likelihood of near significant P value becoming more significant with extra data. BMJ 2014; 348 doi: http://dx.doi.org/10.1136/bmj.g2215 (Published 31 March 2014)Cite this as: BMJ 2014;348:g2215.