SlideShare una empresa de Scribd logo
1 de 39
Descargar para leer sin conexión
How many
solutions does it
take to change
the face of
research data?	


Todd Vision	

Dryad Digital Repository	

University of North Carolina
at Chapel Hill	


KE Workshop	

14-15 November 2011	

Bonn, Germany
“Es sollte nur ein Magazin der Kunst in der Welt sein wo der Künstler seine
Kunstwerke nur hinzugeben hätte um zu nehmen was er brauchte”	


“There ought to be in the world a repository of art, to which the artist need
only bring his artworks in order to take what he needed”	


        	

Beethoven, letter to publisher F.A. Hoffmeister, 15 January 1801
Open dissection of research:
     the Beethoven Repository 	

http://rockethub.com/projects/3755-open-dissection-of-research
n=3824	


Source: Publishing Research Consortium, http://publishingresearch.net
Time of publication


                                 Specific details

                                      General details
Information Content




                                                        Retirement or
                                                        career change

                      Accident

                                                                     Death



                                             Time             (Michener et al. 1997)
Henry Oldenburg
Bumpus HC (1898) The Elimination of the Unfit as Illustrated by the Introduced Sparrow,
Passer domesticus. Biological Lectures from the Marine Biological Laboratory: 209-226.
Transparency
Failure of peer-to-peer data sharing
                                           	


   Wicherts and colleagues requested data from from
     141 articles in American Psychological
     Association journals.	

   “6 months later, after … 400 emails, [sending]
     detailed descriptions of our study aims, approvals
     of our ethical committee, signed assurances not
     to share data with others, and even our full
     resumes…” only 27% of authors complied 	


Wicherts, J.M., Borsboom, D., Kats, J.,  Molenaar, D. (2006). The poor availability
of psychological research data for reanalysis. American Psychologist, 61, 726-728.
News alert: scientists are human
                                         	

“We related the reluctance to share research data for reanalysis to 1148
  statistically significant results reported in 49 papers published in two
  major psychology journals. We found the reluctance to share data to be
  associated with weaker evidence (against the null hypothesis of no effect)
  and a higher prevalence of apparent errors in the reporting of statistical
  results. The unwillingness to share data was particularly clear when
  reporting errors had a bearing on statistical significance”.	





                                 Not shared                 Shared



         Wicherts et al. (2011) doi:10.1371/journal.pone.0026828
Lang GI, Botstein D (2011) PLoS ONE
doi:10.1371/journal.pone.0025290




                 101 pages!
Joint Data Archiving Policy (JDAP)	

Data are important products of the scientific enterprise, and
  they should be preserved and usable for decades in the
  future. 	

As a condition for publication, data supporting the results in
  the article should be deposited in an appropriate public
  archive.	

Authors may elect to embargo access to the data for a
  period up to a year after publication. 	

Exceptions may be granted at the discretion of the editor,
  especially for sensitive information.	



 Whitlock, M. C., M. A. McPeek, M. D. Rausher, L. Rieseberg, and A. J.
 Moore. 2010. Data Archiving. American Naturalist. 175(2):145-146.
Infrastructure
author




                                prepare manuscript
                               and related data files


JOURNAL

     submit manuscript




     manuscript review
                                              DRYAD
                                                           upload data
                           editor


  accepted?       accepted?
                                      send article          Dryad data
     no             yes               description            package


                                       send data
                                     identifier (DOI)         curation


                                                                        data curator


               published article                          published data
              (with data citation)                     (with article citation)
See poster from Brian Hole
Heather Piwowar
Survey of authors
                                	

What are the policies of your funder as they
 apply to online public archiving? (n=983)	

 	

   	

1% Forbids 	

	

 	

   	

21% Recommends	

 	

   	

9% Requires	

 	

   	

40% No policy	

 	

   	

26% I don’t know	

 	

   	

3% Other
Data policies among bioscience journals
                                       	


                                                   IF=3.6
          IF=6.0




                 IF=4.5                                  n=70	


Piwowar HA, Chapman WW (2008) A review of journal policies for sharing research
data. Presented at ELPUB2008, Nature Precedings hdl:10101/npre.2008.1700.1
Reuse
Tracking data reuse
                                       	





Piwowar, Carlson,Vision,
unpublished
H. Piwowar, J. Carlson, T.Vision, unpubl.
H. Piwowar, unpubl.
Incentives
Does sharing imply that it need be altruistic?
                                             	

•  For a set of 85 cancer microarray
   clinical trials	

     48% had publicly available data	

     These received 85% of the article citations	

     Independent of journal impact factor,
      publication date, author nationality	





      Piwowar H, et al. (2007) Sharing Detailed Research Data Is
      Associated with Increased Citation Rate. PLoS ONE 2(3): e308.
Taxonomy of data archiving benefits	

Direct	

                                      Indirect (costs avoided)	

Verification of published research	

           Redundant data collection	

Preserving accessibility to data	

            Inefficient legacy data curation 	

Allowing reuse and repurposing of data	

      Burden of sharing-upon-request	

Discoverability of data 	

                                               Opportunity cost of science not done	


Near term	

                                   Long term	

Protection against personnel turnover	

       Secure long-term stewardship	

Availability for review and validation	

      Increased impact per publication	



Private	

                                     Public	

Increased citations	

                         More efficient use of research dollars	

New collaborations 	

                         Public trust in science	

New research opportunities	

                  Educational opportunities	

Fulfilling funding mandates	

                  Improved methodologies	

                                               More informed policy	


Modified from Beagrie et al. (2009) Keeping Research Data Safe 2	

                   28
Galilei, Galileo (1638) Discorsi e dimostrazioni matematiche, intorno à due
nuove scienze Attenenti alla Mecanica  I Movimenti Locali. Elsevier
Funding
Costs
                                    	

•  Moderate economies of scale are required	

     At 10K packages/yr, $50/deposit, depending on curation	

•  What are the costs for SOM?	

     Journal of Clinical Investigation: $300 flat fee	

     Ecological Archives: $250 10Mb, more fees beyond that	

     FASEB: $100 per file	





       Beagrie N, Eakin-Richards L,Vision TJ (2009) Business models
       and cost estimation: Dryad repository case study. iPRES 2010
What is the return on investment?
                                          	

•  A rigorous framework is lacking	

      But we can look at comparators	

•  Marginal cost of data archiving	

      $50/article is 2% of of publication costs ($2.5K)	

      And 0.2% of grant costs/article (~$25K)	

•  Is the data worth 2% of the research investment?	

      Using DNA microarray data in GEO as a model	

      2,711 submissions in 2007	

      Data reused by 3rd parties in 1,150 articles	




  Vision (2011) Open data and social contract of scientific publishing. BioScience, 60(5):330-330
  Piwowar H,Vision TJ, Whitlock MC (2011) Data archiving is a good investment. Nature 473:285
Training
Building
solutions
DataONE network	

Three major components for
flexibility, scalability and sustainability	

   Member Nodes
                                                •  diverse institutions
                                                Coordinating Nodes
                                                •  serve local community
                                                •  retain complete
                                                Investigator Toolkit
                                                •  provide resources for
                                                   metadata catalog
                                                   managing their data
                                                •  indexing for search
                                                •  retain copies of data
                                                •  network-wide services
                                                •  ensure content
                                                   availability
                                                   (preservation)
                                                •  replication services
Concluding thoughts
                           	

•  Archiving is essential	

•  Journals and learned societies will be at
   least as important as institutions	

•  Funders cannot be shy about policy, and
   must drive the marketplace	

•  We can leverage for data lots of things that
   work well for traditional publications	

•  International cooperation is a must
•    http://datadryad.org	

•    http://blog.datadryad.org	

•    http://datadryad.org/wiki	

•    http://code.google.com/p/dryad	

•    dryad-users@nescent.org	

•       @datadryad	

•        Dryad
Images and sources
                                               	

1.© Yael Fitzpatrick and AAAS, http://www.sciencemag.org/site/special/data/ScienceData-hi.pdf	

2. Beethoven mit der Missa solemnis, by Joseph Stieler; photo CC BY-NC-SA 2.0 Taran
       Rampersad	

  Letter from Beethoven to Franz Anton Hoffmeister, © Beethoven-Haus Bonn 	

3. The Wikipedia Lesson of Dr Nicolaes Tulp, by Alasdair Forrest, http://
       alasdairforrest.posterous.com/the-curious-case-of-the-changing-citation	

4. © National Evolutionary Synthesis Center (http://nescent.org)	

5. © Publishing Research Consortium, source: http://publishingresearch.net	

6. After Michener et al. (1997) Ecological Applications 7(1):330–342.	

7. Title page of Philosophical Transactions of the Royal Society, Vol. 1, 1665, public domain;
       portrait of Henry Oldenburg, source:
       http://en.wikipedia.org/wiki/File:Henry_Oldenburg.jpg, public domain	

8. source: Bumpus HC (1898) The Elimination of the Unfit as Illustrated by the Introduced
       Sparrow, Passer domesticus. Biological Lectures from the Marine Biological Laboratory:
       209-226, public domain.	

9. CC BY-NC-ND 2.0 by lebatihem, source:http://www.flickr.com/photos/lebatihem/
       2154686107/	

11. CC-BY Wicherts JM, Bakker M, Molenaar D source: Willingness to Share Research Data Is
       Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results.
       PLoS ONE 6(11): e26828, 2011. doi:10.1371/journal.pone.0026828
12. CC-BY, Lang GI, Botstein D source: A Test of the Coordinated Expression Hypothesis for
       the Origin and Maintenance of the GAL Cluster in Yeast. PLoS ONE 6(9): e25290, 2011.
       doi:10.1371/journal.pone.0025290	

13. CC BY-SA 2.0 avlxyz, source:http://www.flickr.com/photos/avlxyz/4589977933/	

16. courtesy of Peggy Schaeffer	

20. CC-BY Piwowar HA, Chapman WW source: A review of journal policies for sharing
       research data, Nature Precedings, hdl:10101/npre.2008.1700.1	

21. CC BY 2.0, sashafatcat source:http://www.flickr.com/photos/sashafatcat/2381412445	

23, 24. CC-BY H Piwowar, J Carlson, T Vision, unpublished	

25. CC-BY H Piwowar, source: http://researchremix.wordpress.com/2011/05/28/dear-nsf-
       reviewers/	

26. CC BY-ND 2.0 Sivaprakash Kannan source: http://www.flickr.com/photos/sivaprakash/
       294755142/	

28. After: Beagrie N, Lavoie B, Woollard M (2010) Keeping Research Data Safe 2, http://
       www.jisc.ac.uk/media/documents/publications/reports/2010/keepingresearchdatasafe2.pdf	

29. Galilei, Galileo (1638) Discorsi e dimostrazioni matematiche, intorno à due nuove scienze
       Attenenti alla Mecanica  I Movimenti Locali. Elsevier. Source: original unknown.	

30. CC BY-NC-SA 2.0 Coralie Mercer, source: http://www.flickr.com/photos/koalie/394934841/	

33. CC BY-NC-ND 2.0 by www.english.school.nz, source: http://www.flickr.com/photos/iei/
       2904115612/	

34. Liberty ship under construction, source:
       http://en.wikipedia.org/wiki/File:Liberty_ship_construction_04_bottom.jpg, public domain

Más contenido relacionado

La actualidad más candente

Laurie Goodman: Overcoming Hurdles to Data Publication
Laurie Goodman: Overcoming Hurdles to Data PublicationLaurie Goodman: Overcoming Hurdles to Data Publication
Laurie Goodman: Overcoming Hurdles to Data PublicationGigaScience, BGI Hong Kong
 
dkNET Poster Experimental Biology 2019
dkNET Poster Experimental Biology 2019dkNET Poster Experimental Biology 2019
dkNET Poster Experimental Biology 2019dkNET
 
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental MetadataMaking it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental MetadataMichel Dumontier
 
The DataTags System: Sharing Sensitive Data with Confidence
The DataTags System: Sharing Sensitive Data with ConfidenceThe DataTags System: Sharing Sensitive Data with Confidence
The DataTags System: Sharing Sensitive Data with ConfidenceMerce Crosas
 
Why study Data Sharing? (+ why share your data)
Why study Data Sharing?  (+ why share your data)Why study Data Sharing?  (+ why share your data)
Why study Data Sharing? (+ why share your data)Heather Piwowar
 
Building an NIH Data Catalog: Bit by Bit
Building an NIH Data Catalog: Bit by BitBuilding an NIH Data Catalog: Bit by Bit
Building an NIH Data Catalog: Bit by Bitreadkev
 
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...Merce Crosas
 
dkNET Poster ENDO 2019
dkNET Poster ENDO 2019dkNET Poster ENDO 2019
dkNET Poster ENDO 2019dkNET
 
Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors M...
Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors M...Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors M...
Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors M...GigaScience, BGI Hong Kong
 
Reproducible research: First steps.
Reproducible research: First steps. Reproducible research: First steps.
Reproducible research: First steps. Richard Layton
 
GigaScience: a new resource for the big-data community.
GigaScience: a new resource for the big-data community.GigaScience: a new resource for the big-data community.
GigaScience: a new resource for the big-data community.GigaScience, BGI Hong Kong
 
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
2016 ACS Semantic Approaches for Biochemical Knowledge DiscoveryMichel Dumontier
 
Gaining credit for sharing research data
Gaining credit for sharing research dataGaining credit for sharing research data
Gaining credit for sharing research dataVarsha Khodiyar
 
W3C HCLS Dataset Description Guidelines
W3C HCLS Dataset Description GuidelinesW3C HCLS Dataset Description Guidelines
W3C HCLS Dataset Description GuidelinesMichel Dumontier
 

La actualidad más candente (19)

Laurie Goodman: Overcoming Hurdles to Data Publication
Laurie Goodman: Overcoming Hurdles to Data PublicationLaurie Goodman: Overcoming Hurdles to Data Publication
Laurie Goodman: Overcoming Hurdles to Data Publication
 
dkNET Poster Experimental Biology 2019
dkNET Poster Experimental Biology 2019dkNET Poster Experimental Biology 2019
dkNET Poster Experimental Biology 2019
 
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental MetadataMaking it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
 
The DataTags System: Sharing Sensitive Data with Confidence
The DataTags System: Sharing Sensitive Data with ConfidenceThe DataTags System: Sharing Sensitive Data with Confidence
The DataTags System: Sharing Sensitive Data with Confidence
 
Why study Data Sharing? (+ why share your data)
Why study Data Sharing?  (+ why share your data)Why study Data Sharing?  (+ why share your data)
Why study Data Sharing? (+ why share your data)
 
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
 
FAIR data and the Etsin service
FAIR data and the Etsin serviceFAIR data and the Etsin service
FAIR data and the Etsin service
 
Building an NIH Data Catalog: Bit by Bit
Building an NIH Data Catalog: Bit by BitBuilding an NIH Data Catalog: Bit by Bit
Building an NIH Data Catalog: Bit by Bit
 
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
 
dkNET Poster ENDO 2019
dkNET Poster ENDO 2019dkNET Poster ENDO 2019
dkNET Poster ENDO 2019
 
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
 
Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors M...
Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors M...Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors M...
Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors M...
 
Reproducible research: First steps.
Reproducible research: First steps. Reproducible research: First steps.
Reproducible research: First steps.
 
GigaScience: a new resource for the big-data community.
GigaScience: a new resource for the big-data community.GigaScience: a new resource for the big-data community.
GigaScience: a new resource for the big-data community.
 
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
 
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
 
Gaining credit for sharing research data
Gaining credit for sharing research dataGaining credit for sharing research data
Gaining credit for sharing research data
 
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific ExperimentsAn Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
 
W3C HCLS Dataset Description Guidelines
W3C HCLS Dataset Description GuidelinesW3C HCLS Dataset Description Guidelines
W3C HCLS Dataset Description Guidelines
 

Similar a Knowledge Exchange, Nov 2011, Bonn

2021-01-27--biodiversity-informatics-gbif-(52slides)
2021-01-27--biodiversity-informatics-gbif-(52slides)2021-01-27--biodiversity-informatics-gbif-(52slides)
2021-01-27--biodiversity-informatics-gbif-(52slides)Dag Endresen
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...Carole Goble
 
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...GigaScience, BGI Hong Kong
 
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingGigaScience, BGI Hong Kong
 
Research Data Sharing LERU
Research Data Sharing LERU Research Data Sharing LERU
Research Data Sharing LERU LIBER Europe
 
Public Data Archiving in Ecology and Evolution: How well are we doing?
Public Data Archiving in Ecology and Evolution: How well are we doing?Public Data Archiving in Ecology and Evolution: How well are we doing?
Public Data Archiving in Ecology and Evolution: How well are we doing?Sandra Binning
 
A research passport: library requirements
A research passport: library requirementsA research passport: library requirements
A research passport: library requirementsLIBER Europe
 
Rewarding data publication: ipt.biodiversity.aq
Rewarding data publication: ipt.biodiversity.aqRewarding data publication: ipt.biodiversity.aq
Rewarding data publication: ipt.biodiversity.aqAnton Van de Putte
 
Scott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data delugeScott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data delugeGigaScience, BGI Hong Kong
 
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...GigaScience, BGI Hong Kong
 
Disciplinary and institutional perspectives on digital curation
Disciplinary and institutional perspectives on digital curationDisciplinary and institutional perspectives on digital curation
Disciplinary and institutional perspectives on digital curationMichael Day
 
The Dryad Digital Repository: Published data as part of the greater data ecos...
The Dryad Digital Repository: Published data as part of the greater data ecos...The Dryad Digital Repository: Published data as part of the greater data ecos...
The Dryad Digital Repository: Published data as part of the greater data ecos...Hilmar Lapp
 
Dataset Citation and Identification
Dataset Citation and IdentificationDataset Citation and Identification
Dataset Citation and Identificationguest453b14
 
Dataset Citation and Identification
Dataset Citation and IdentificationDataset Citation and Identification
Dataset Citation and Identificationguest453b14
 
Dataset Citation and Identification
Dataset Citation and IdentificationDataset Citation and Identification
Dataset Citation and Identificationguest453b14
 
Dataset citation and identification
Dataset citation and identificationDataset citation and identification
Dataset citation and identificationAdam Farquhar
 
Laurie Goodman at #SSPBoston: Article+Data+Tools Reproducibility, Reuse, & Ra...
Laurie Goodman at #SSPBoston: Article+Data+ToolsReproducibility, Reuse, & Ra...Laurie Goodman at #SSPBoston: Article+Data+ToolsReproducibility, Reuse, & Ra...
Laurie Goodman at #SSPBoston: Article+Data+Tools Reproducibility, Reuse, & Ra...GigaScience, BGI Hong Kong
 
Where is the opportunity for libraries in the collaborative data infrastructure?
Where is the opportunity for libraries in the collaborative data infrastructure?Where is the opportunity for libraries in the collaborative data infrastructure?
Where is the opportunity for libraries in the collaborative data infrastructure?LIBER Europe
 

Similar a Knowledge Exchange, Nov 2011, Bonn (20)

2021-01-27--biodiversity-informatics-gbif-(52slides)
2021-01-27--biodiversity-informatics-gbif-(52slides)2021-01-27--biodiversity-informatics-gbif-(52slides)
2021-01-27--biodiversity-informatics-gbif-(52slides)
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
 
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
 
B4OS-2012
B4OS-2012B4OS-2012
B4OS-2012
 
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
 
Research Data Sharing LERU
Research Data Sharing LERU Research Data Sharing LERU
Research Data Sharing LERU
 
Public Data Archiving in Ecology and Evolution: How well are we doing?
Public Data Archiving in Ecology and Evolution: How well are we doing?Public Data Archiving in Ecology and Evolution: How well are we doing?
Public Data Archiving in Ecology and Evolution: How well are we doing?
 
A research passport: library requirements
A research passport: library requirementsA research passport: library requirements
A research passport: library requirements
 
Rewarding data publication: ipt.biodiversity.aq
Rewarding data publication: ipt.biodiversity.aqRewarding data publication: ipt.biodiversity.aq
Rewarding data publication: ipt.biodiversity.aq
 
Scott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data delugeScott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data deluge
 
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...
 
Disciplinary and institutional perspectives on digital curation
Disciplinary and institutional perspectives on digital curationDisciplinary and institutional perspectives on digital curation
Disciplinary and institutional perspectives on digital curation
 
The Dryad Digital Repository: Published data as part of the greater data ecos...
The Dryad Digital Repository: Published data as part of the greater data ecos...The Dryad Digital Repository: Published data as part of the greater data ecos...
The Dryad Digital Repository: Published data as part of the greater data ecos...
 
Engaging the Researcher in RDM
Engaging the Researcher in RDMEngaging the Researcher in RDM
Engaging the Researcher in RDM
 
Dataset Citation and Identification
Dataset Citation and IdentificationDataset Citation and Identification
Dataset Citation and Identification
 
Dataset Citation and Identification
Dataset Citation and IdentificationDataset Citation and Identification
Dataset Citation and Identification
 
Dataset Citation and Identification
Dataset Citation and IdentificationDataset Citation and Identification
Dataset Citation and Identification
 
Dataset citation and identification
Dataset citation and identificationDataset citation and identification
Dataset citation and identification
 
Laurie Goodman at #SSPBoston: Article+Data+Tools Reproducibility, Reuse, & Ra...
Laurie Goodman at #SSPBoston: Article+Data+ToolsReproducibility, Reuse, & Ra...Laurie Goodman at #SSPBoston: Article+Data+ToolsReproducibility, Reuse, & Ra...
Laurie Goodman at #SSPBoston: Article+Data+Tools Reproducibility, Reuse, & Ra...
 
Where is the opportunity for libraries in the collaborative data infrastructure?
Where is the opportunity for libraries in the collaborative data infrastructure?Where is the opportunity for libraries in the collaborative data infrastructure?
Where is the opportunity for libraries in the collaborative data infrastructure?
 

Último

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Último (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Knowledge Exchange, Nov 2011, Bonn

  • 1. How many solutions does it take to change the face of research data? Todd Vision Dryad Digital Repository University of North Carolina at Chapel Hill KE Workshop 14-15 November 2011 Bonn, Germany
  • 2. “Es sollte nur ein Magazin der Kunst in der Welt sein wo der Künstler seine Kunstwerke nur hinzugeben hätte um zu nehmen was er brauchte” “There ought to be in the world a repository of art, to which the artist need only bring his artworks in order to take what he needed” Beethoven, letter to publisher F.A. Hoffmeister, 15 January 1801
  • 3. Open dissection of research: the Beethoven Repository http://rockethub.com/projects/3755-open-dissection-of-research
  • 4.
  • 5. n=3824 Source: Publishing Research Consortium, http://publishingresearch.net
  • 6. Time of publication Specific details General details Information Content Retirement or career change Accident Death Time (Michener et al. 1997)
  • 8. Bumpus HC (1898) The Elimination of the Unfit as Illustrated by the Introduced Sparrow, Passer domesticus. Biological Lectures from the Marine Biological Laboratory: 209-226.
  • 10. Failure of peer-to-peer data sharing Wicherts and colleagues requested data from from 141 articles in American Psychological Association journals. “6 months later, after … 400 emails, [sending] detailed descriptions of our study aims, approvals of our ethical committee, signed assurances not to share data with others, and even our full resumes…” only 27% of authors complied Wicherts, J.M., Borsboom, D., Kats, J., Molenaar, D. (2006). The poor availability of psychological research data for reanalysis. American Psychologist, 61, 726-728.
  • 11. News alert: scientists are human “We related the reluctance to share research data for reanalysis to 1148 statistically significant results reported in 49 papers published in two major psychology journals. We found the reluctance to share data to be associated with weaker evidence (against the null hypothesis of no effect) and a higher prevalence of apparent errors in the reporting of statistical results. The unwillingness to share data was particularly clear when reporting errors had a bearing on statistical significance”. Not shared Shared Wicherts et al. (2011) doi:10.1371/journal.pone.0026828
  • 12. Lang GI, Botstein D (2011) PLoS ONE doi:10.1371/journal.pone.0025290 101 pages!
  • 13. Joint Data Archiving Policy (JDAP) Data are important products of the scientific enterprise, and they should be preserved and usable for decades in the future. As a condition for publication, data supporting the results in the article should be deposited in an appropriate public archive. Authors may elect to embargo access to the data for a period up to a year after publication. Exceptions may be granted at the discretion of the editor, especially for sensitive information. Whitlock, M. C., M. A. McPeek, M. D. Rausher, L. Rieseberg, and A. J. Moore. 2010. Data Archiving. American Naturalist. 175(2):145-146.
  • 15.
  • 16. author prepare manuscript and related data files JOURNAL submit manuscript manuscript review DRYAD upload data editor accepted? accepted? send article Dryad data no yes description package send data identifier (DOI) curation data curator published article published data (with data citation) (with article citation)
  • 17. See poster from Brian Hole
  • 19. Survey of authors What are the policies of your funder as they apply to online public archiving? (n=983) 1% Forbids 21% Recommends 9% Requires 40% No policy 26% I don’t know 3% Other
  • 20. Data policies among bioscience journals IF=3.6 IF=6.0 IF=4.5 n=70 Piwowar HA, Chapman WW (2008) A review of journal policies for sharing research data. Presented at ELPUB2008, Nature Precedings hdl:10101/npre.2008.1700.1
  • 21. Reuse
  • 22.
  • 23. Tracking data reuse Piwowar, Carlson,Vision, unpublished
  • 24. H. Piwowar, J. Carlson, T.Vision, unpubl.
  • 27. Does sharing imply that it need be altruistic? •  For a set of 85 cancer microarray clinical trials   48% had publicly available data   These received 85% of the article citations   Independent of journal impact factor, publication date, author nationality Piwowar H, et al. (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308.
  • 28. Taxonomy of data archiving benefits Direct Indirect (costs avoided) Verification of published research Redundant data collection Preserving accessibility to data Inefficient legacy data curation Allowing reuse and repurposing of data Burden of sharing-upon-request Discoverability of data Opportunity cost of science not done Near term Long term Protection against personnel turnover Secure long-term stewardship Availability for review and validation Increased impact per publication Private Public Increased citations More efficient use of research dollars New collaborations Public trust in science New research opportunities Educational opportunities Fulfilling funding mandates Improved methodologies More informed policy Modified from Beagrie et al. (2009) Keeping Research Data Safe 2 28
  • 29. Galilei, Galileo (1638) Discorsi e dimostrazioni matematiche, intorno à due nuove scienze Attenenti alla Mecanica I Movimenti Locali. Elsevier
  • 31. Costs •  Moderate economies of scale are required   At 10K packages/yr, $50/deposit, depending on curation •  What are the costs for SOM?   Journal of Clinical Investigation: $300 flat fee   Ecological Archives: $250 10Mb, more fees beyond that   FASEB: $100 per file Beagrie N, Eakin-Richards L,Vision TJ (2009) Business models and cost estimation: Dryad repository case study. iPRES 2010
  • 32. What is the return on investment? •  A rigorous framework is lacking   But we can look at comparators •  Marginal cost of data archiving   $50/article is 2% of of publication costs ($2.5K)   And 0.2% of grant costs/article (~$25K) •  Is the data worth 2% of the research investment?   Using DNA microarray data in GEO as a model   2,711 submissions in 2007   Data reused by 3rd parties in 1,150 articles Vision (2011) Open data and social contract of scientific publishing. BioScience, 60(5):330-330 Piwowar H,Vision TJ, Whitlock MC (2011) Data archiving is a good investment. Nature 473:285
  • 35. DataONE network Three major components for flexibility, scalability and sustainability Member Nodes •  diverse institutions Coordinating Nodes •  serve local community •  retain complete Investigator Toolkit •  provide resources for metadata catalog managing their data •  indexing for search •  retain copies of data •  network-wide services •  ensure content availability (preservation) •  replication services
  • 36. Concluding thoughts •  Archiving is essential •  Journals and learned societies will be at least as important as institutions •  Funders cannot be shy about policy, and must drive the marketplace •  We can leverage for data lots of things that work well for traditional publications •  International cooperation is a must
  • 37. •  http://datadryad.org •  http://blog.datadryad.org •  http://datadryad.org/wiki •  http://code.google.com/p/dryad •  dryad-users@nescent.org •  @datadryad •  Dryad
  • 38. Images and sources 1.© Yael Fitzpatrick and AAAS, http://www.sciencemag.org/site/special/data/ScienceData-hi.pdf 2. Beethoven mit der Missa solemnis, by Joseph Stieler; photo CC BY-NC-SA 2.0 Taran Rampersad Letter from Beethoven to Franz Anton Hoffmeister, © Beethoven-Haus Bonn 3. The Wikipedia Lesson of Dr Nicolaes Tulp, by Alasdair Forrest, http:// alasdairforrest.posterous.com/the-curious-case-of-the-changing-citation 4. © National Evolutionary Synthesis Center (http://nescent.org) 5. © Publishing Research Consortium, source: http://publishingresearch.net 6. After Michener et al. (1997) Ecological Applications 7(1):330–342. 7. Title page of Philosophical Transactions of the Royal Society, Vol. 1, 1665, public domain; portrait of Henry Oldenburg, source: http://en.wikipedia.org/wiki/File:Henry_Oldenburg.jpg, public domain 8. source: Bumpus HC (1898) The Elimination of the Unfit as Illustrated by the Introduced Sparrow, Passer domesticus. Biological Lectures from the Marine Biological Laboratory: 209-226, public domain. 9. CC BY-NC-ND 2.0 by lebatihem, source:http://www.flickr.com/photos/lebatihem/ 2154686107/ 11. CC-BY Wicherts JM, Bakker M, Molenaar D source: Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results. PLoS ONE 6(11): e26828, 2011. doi:10.1371/journal.pone.0026828
  • 39. 12. CC-BY, Lang GI, Botstein D source: A Test of the Coordinated Expression Hypothesis for the Origin and Maintenance of the GAL Cluster in Yeast. PLoS ONE 6(9): e25290, 2011. doi:10.1371/journal.pone.0025290 13. CC BY-SA 2.0 avlxyz, source:http://www.flickr.com/photos/avlxyz/4589977933/ 16. courtesy of Peggy Schaeffer 20. CC-BY Piwowar HA, Chapman WW source: A review of journal policies for sharing research data, Nature Precedings, hdl:10101/npre.2008.1700.1 21. CC BY 2.0, sashafatcat source:http://www.flickr.com/photos/sashafatcat/2381412445 23, 24. CC-BY H Piwowar, J Carlson, T Vision, unpublished 25. CC-BY H Piwowar, source: http://researchremix.wordpress.com/2011/05/28/dear-nsf- reviewers/ 26. CC BY-ND 2.0 Sivaprakash Kannan source: http://www.flickr.com/photos/sivaprakash/ 294755142/ 28. After: Beagrie N, Lavoie B, Woollard M (2010) Keeping Research Data Safe 2, http:// www.jisc.ac.uk/media/documents/publications/reports/2010/keepingresearchdatasafe2.pdf 29. Galilei, Galileo (1638) Discorsi e dimostrazioni matematiche, intorno à due nuove scienze Attenenti alla Mecanica I Movimenti Locali. Elsevier. Source: original unknown. 30. CC BY-NC-SA 2.0 Coralie Mercer, source: http://www.flickr.com/photos/koalie/394934841/ 33. CC BY-NC-ND 2.0 by www.english.school.nz, source: http://www.flickr.com/photos/iei/ 2904115612/ 34. Liberty ship under construction, source: http://en.wikipedia.org/wiki/File:Liberty_ship_construction_04_bottom.jpg, public domain