SlideShare una empresa de Scribd logo
1 de 55
Data publication in the data deluge




Scott Edmunds, GigaScience/BGI Hong Kong
COASP 2012, Budapest, 20th September 2012


          www.gigasciencejournal.com
The Data Challenge:
•1.2 zettabytes (1021) electronic data generated globally each year

•>Exponential growth of genomics data (& growth in imaging and
MS data following)




                                        Source: http://www.genome.gov/sequencingcosts/ (with apologies)



•Issues with reproducibility, hosting, curation, interoperability

•Need for better incentives to overcome these

 Source: 1. Mervis J. U.S. science policy. Agencies rally to tackle big data. Science. 2012 Apr 6;336(6077):22.
Large-Scale Data
      Journal/Database
    In conjunction with:

Editor-in-Chief: Laurie Goodman, PhD
Editor: Scott Edmunds, PhD
Commisioning Editor: Nicole Nogoy, PhD
Lead Curator: Tam Sneddon D.Phil
Data Platform: Peter Li, PhD
  www.gigasciencejournal.com
GigaDB is a new database integrated with the GigaScience journal to meet the needs of a new generation of biological
and biomedical research as it enters the era of “big-data”… (see more)
GDSAP: Genomic Data Submission
       and Analytical platform
Anatomy of a Publication
 Idea




Study




           Metadata


           Data
Analysis




Answer
Anatomy of a Data Publication
 Idea




Study




           Metadata


           Data
Analysis




Answer
Issues for Data Publication
 Idea
                      Cultural issues:

Study


                      Technical issues:
           Metadata


           Data
Analysis




Answer
Issues for Data Publication
 Idea
                                                                         Cultural issues:

Study




                     Metadata


                    Data
Analysis


                                                                          Adoption held back by:
Answer                                                                            journal policies, citation, tracking…

* T-Shirts available from Graham Steel / http://www.zazzle.co.uk/steelgraham
Issues for Data Publication
 Idea




Study


                        Technical issues:
           Metadata

                      What do we do with the data?
           Data
Analysis




Answer
Issues for Data Publication
 Idea




Study


                        Technical issues:
           Metadata

                      What do we do with the data?
           Data
                      Lightweight:
Analysis                 •Metadata only journals
                         •Get someone else to host
                      Heavyweight:
Answer                   •Become a repository
To host or not to host?
 Against: supplementary files argument
                      The Journal of Neuroscience        Average size of a Journal of Neuroscience article
                                                         and supplemental material in megabytes.
Announcement Regarding Supplemental Material:
Beginning November 1, 2010, The Journal of
Neuroscience will no longer allow authors to include
supplemental material when they submit new manuscripts
and will no longer host supplemental material on its web site
for those articles.



“While the size of articles has grown gradually over the
past decade, the supplemental material associated with a
typical Journal article appears to be growing exponentially
and is rapidly approaching the size of an article. The
sheer volume of supplemental material is adversely
affecting peer review.”

                                                              Maunsell J J. Neurosci. 2010;30:10599-10600
$1000 genome = million $ peer-review?
   To review:                                                    (>6TBp, >1500 datasets)


                             S3 (storage) =                                                         $15,000
                             EC2 (analysis w/ BLASTx) = $500,000
Source: Folker Meyer/Wilkening et al. 2009, CLUSTER'09. IEEE International Conference on Cluster Computing and Workshops
$1000 genome = million $ peer-review?
    To review:                                                    (>6TBp, >1500 datasets)


                              S3 (storage) =                                                         $15,000
                              EC2 (analysis w/ BLASTx) = $500,000
 Source: Folker Meyer/Wilkening et al. 2009, CLUSTER'09. IEEE International Conference on Cluster Computing and Workshops


ENCODE analysis Virtual Machine:

                               Containing: input data, code
                               bundles with scripts and
                               processing steps, outputs

                               AWS = ~$5,000
 Source: James Taylor / http://encodeproject.org/ENCODE/integrativeAnalysis/VM
To host or not to host?
For: reproducibility
The Guardian, 14th September 2012:                   Replication is the only solution to scientific fraud.
http://www.guardian.co.uk/commentisfree/2012/sep/14/solution-scientific-fraud-replication




 For: “data is the new oil”
 William Gibson: "Information is the currency of the future world”

 Sir Tim Berners-Lee: "Data is a precious thing and will last longer than
 the systems themselves”



                           Move compute to the data: think EC2 rather than S3
                                  DNA Nexus + 0.5PB SRA data = $15 million given by Google


Source:DNA Nexus/SRA http://techcrunch.com/2011/10/12/dnanexus-raises-15-million-teams-with-google-to-host-massive-dna-database/
Overcoming cultural hurdles…




             ?
Overcoming cultural hurdles…
  Adventures in Data Citation




   doi:10.5524/100001
For data citation to work, needs:

1. Proven utility/potential user base.

2. Acceptance/inclusion by journals.

3. Data+Citation: inclusion in the references.

4. Tracking by citation indexes.

5. Usage of the metrics by the community…
Datacitation 1: utility/user base.
Establishment of data DOIs and use by databases:
                  Shackleton NJ, Hall MA, Vincent E (2001): Mean stable carbon isotope ratios
                  of Cibicidoides wuellerstorfi from sediment core MD95-2042 on the Iberian
                  margin, North Atlantic. PANGAEA - Data Publisher for Earth & Environmental
                  Science. http://doi.pangaea.de/10.1594/PANGAEA.58229
 Cited in:
 Pahnke K, Zahn R: Southern Hemisphere Water Mass Conversion Linked with North Atlantic
 Climate Variability. Science 2005, 307:1741 -1746.


                              Nocek B, Xu X, Savchenko A, Edwards A, Joachimiak A. 2007. PDB
                             ID: 2P06 Crystal structure of a predicted coding region AF_0060
                             from Archaeoglobus fulgidus DSM 4304. 10.2210/pdb2p06/pdb.

 Cited in:
 Andreeva A, Howorth D, Chandonia J-M, Brenner SE, Hubbard TJP, Chothia C, Murzin AG: Data
 growth and its impact on the SCOP database: new developments. Nucleic Acids Res. 2008,
 36:D419-425.
BGI Datasets Get DOI®s
Invertebrate                                     Released pre-publication
Ant                                              Paper Published in GigaScience
- Florida carpenter ant                                   Microbe
- Jerdon’s jumping ant          Vertebrates               E. Coli O104:H4 TY-2482
- Leaf-cutter ant               Darwin’s Finch            T2D gut metagenome
Roundworm                       Giant panda Macaque
Schistosoma                     -Chinese rhesus           Cell-Lines
Silkworm                        -Crab-eating              Chinese Hamster Ovary
                                Mini-Pig                  Mouse methylomes
Human                           Naked mole rat
Asian individual (YH)           Parrot, Puerto Rican      PLANTS
- DNA Methylome                 Penguin                   Chinese cabbage
- Genome Assembly               - Emperor penguin         Cucumber
- Transcriptome                 - Adelie penguin          Foxtail millet
Cancer (14TB)                   Pigeon, domestic          Pigeonpea
Single cell bladder cancer      Polar bear                Potato
HBV infected exomes             Sheep                     Sorghum
Ancient DNA                     Tibetan antelope
- Saqqaq Eskimo
- Aboriginal Australian
Our first DOI:


To maximize its utility to the research community and aid those fighting
the current epidemic, genomic data is released here into the public domain
under a CC0 license. Until the publication of research papers on the
assembly and whole-genome analysis of this isolate we would ask you to
cite this dataset as:

Li, D; Xi, F; Zhao, M; Liang, Y; Chen, W; Cao, S; Xu, R; Wang, G; Wang, J; Zhang,
Z; Li, Y; Cui, Y; Chang, C; Cui, C; Luo, Y; Qin, J; Li, S; Li, J; Peng, Y; Pu, F; Sun,
Y; Chen,Y; Zong, Y; Ma, X; Yang, X; Cen, Z; Zhao, X; Chen, F; Yin, X; Song,Y ;
Rohde, H; Li, Y; Wang, J; Wang, J and the Escherichia coli O104:H4 TY-2482
isolate genome sequencing consortium (2011)
Genomic data from Escherichia coli O104:H4 isolate TY-2482. BGI Shenzhen.
doi:10.5524/100001
http://dx.doi.org/10.5524/100001

            To the extent possible under law, BGI Shenzhen has waived all copyright and related or neighboring rights to
            Genomic Data from the 2011 E. coli outbreak. This work is published from: China.
1.3 The power of intelligently open data
The benefits of intelligently open data were powerfully
illustrated by events following an outbreak of a severe gastro-
intestinal infection in Hamburg in Germany in May 2011. This
spread through several European countries and the US,
affecting about 4000 people and resulting in over 50 deaths. All
tested positive for an unusual and little-known Shiga-toxin–
producing E. coli bacterium. The strain was initially analysed by
scientists at BGI-Shenzhen in China, working together with
those in Hamburg, and three days later a draft genome was
released under an open data licence. This generated interest
from bioinformaticians on four continents. 24 hours after the
release of the genome it had been assembled. Within a week
two dozen reports had been filed on an open-source site
dedicated to the analysis of the strain. These analyses
provided crucial information about the strain’s virulence and
resistance genes – how it spreads and which antibiotics are
effective against it. They produced results in time to help
contain the outbreak. By July 2011, scientists published papers
based on this work. By opening up their early sequencing
results to international collaboration, researchers in Hamburg
produced results that were quickly tested by a wide range of
experts, used to produce new knowledge and ultimately to
control a public health emergency.
Data Citation 2: acceptance by journals
Data Citation 2: acceptance by journals
Data+Citation 3: inclusion in the references
In the references…
Is the DOI…




* Certain types of genomics data must also be deposited in INSDC databases (SRA & Genbank).
And in more journals…

               Hodkinson BP, Uehling JK, Smith ME (2012) Data from: Lepidostroma
               vilgalysii, a new basidiolichen from the New World. Dryad Digital
               Repository. doi:10.5061/dryad.j1g5dh23
Cited in:
Hodkinson BP, Uehling JK, Smith ME: Lepidostroma vilgalysii, a new basidiolichen
from the New World. Mycological Progress 2012. Advance Online Publication.



                        Roberts SB (2012) Herring Hepatic Transcriptome 34300
                        contigs.fa. Figshare. Available:
                        hdl.handle.net/10779/084d34370fbda29bbc6​7b3c5ecb02
                        575. Accessed 2012 Jan 20.
 Cited in:
 Roberts SB, Hauser L, Seeb LW, Seeb JE (2012) Development of Genomic Resources
 for Pacific Herring through Targeted Transcriptome Pyrosequencing. PLoS ONE 7(2):
 e30908. doi:10.1371/journal.pone.0030908
For data citation to work, needs:

1. Proven utility/potential user base.   ✔
2. Acceptance/inclusion by journals.     ✔
3. Data+Citation: inclusion in the references.   ✔
4. Tracking by citation indexes.

5. Usage of the metrics by the community…
Datacitation 4: tracking?
Datacitation 4: tracking?
                        ✗FAIL
       DataCite metadata in harvestable form (OAI-PMH)

               - lists some DataCite DOIs, but says:

Datasets listed are the “result of approximations in the indexing
algorithms.”
“Google Scholar's intended coverage is for scholarly articles. At
this point, we don't include datasets. “
Datacitation 4: tracking?
             ✗FAIL
DataCite metadata in harvestable form (OAI-PMH)




✗      Working on it.          Coming soon…
Datacitation 5: metrics?
“As a result of diverse practices and tool
limitations, data citations are currently very
difficult to track.”
Datacitation 5: metrics?
                          ✗FAIL
    Research Remix, 29th May 2012: http://researchremix.wordpress.com/2012/05/29/dear-research-
    data-advocate-please-sign-the-petition-oamonday/

“I’m afraid we are making promises to data
creators about attribution and reward that we
can’t keep. ”Make your data citeable!” is the cry.
OK. So citeable is step one. Cited is step two. But
for the citation to be useful, it has to be indexed
so that citation metrics can be tracked and
admired and used.
Who is indexing data citations right now? As far
as I can tell: absolutely no one.”
Where data citation is in 2012:
1. Proven utility/potential user base.   ✔
2. Acceptance/inclusion by journals.     ✔
3. Data+Citation: inclusion in the references.   ✔
4. Tracking by citation indexes.          ✔/✗
5. Usage of the metrics by the community…        ✗
Overcoming technical hurdles…




             ?
Addressing the reproducibility gap:
Computable methods/workflow systems
Bioinformatics
Development      Biomedical and bioinformatics research   Publishing
Redefining what is a paper in the era of big-data?

                goal: Executable Research Objects




                                        Citable DOI
Publication




• Background

• Methods

• Results (Data)

• Conclusions/Discussion

                           doi:10.1186/2047-217X-1-3
Data
                                     Publication




• Background

• Methods

• Results (Data)
                                   doi:10.5524/100035
• Conclusions/Discussion

                           doi:10.1186/2047-217X-1-3
Methods +
                                     Data +
                                     Publication




• Background

• Methods                          Doi for workflows?


• Results (Data)
                                   doi:10.5524/100035
• Conclusions/Discussion

                           doi:10.1186/2047-217X-1-3
Data                  Methods            Analysis


doi:10.5524/100035   +    DOI: x   =   doi:10.1186/2047-217X-1-3


   DOI: A            +    DOI: X   =         DOI: 1
Data                  Methods            Analysis


doi:10.5524/100035   +    DOI: x   =   doi:10.1186/2047-217X-1-3


   DOI: A            +    DOI: X   =         DOI: 1

   DOI: B            +    DOI: X   =         DOI: 2
Data                  Methods            Analysis


doi:10.5524/100035   +    DOI: x   =   doi:10.1186/2047-217X-1-3


   DOI: A            +    DOI: X   =         DOI: 1

   DOI: B            +    DOI: X   =         DOI: 2

  DOI: A             +    DOI: Y   =         DOI: 3
Data                  Methods             Analysis


doi:10.5524/100035   +    DOI: x    =   doi:10.1186/2047-217X-1-3


   DOI: A            +    DOI: X    =         DOI: 1

   DOI: B            +    DOI: X    =         DOI: 2

  DOI: A             +    DOI: Y    =         DOI: 3

  A, B, C…               X, Y, Z…   =         4, 5, 6…
Different shaped publishable objects
  Data
 Papers



Executable
(Methods)
  Papers


 Analysis
  Papers
Different shaped publishable objects
         Different levels of granularity


   Experiment                                 e.g. doi:10.5524/100001        Papers
(e.g. ACRG project)


                                              e.g. doi:10.5524/100001-2     Data/
    Datasets                                                              Micropubs
 (e.g. cancer type)

                                              e.g. doi:10.5524/100001-2000
    Sample                                    or doi:10.5524/100001_xyz
(e.g. specimen xyz)



 Smaller still?       Facts/Assertions (~1013 in literature)              Nanopubs
Adding “value” publishing data

• Scope for different shaped publishable objects
• Scope for publishing methods/executable papers
• Peer review of data problematic
     – Post publication peer review
     – Change criteria (assess on transparency/access only)
     – Better use of workflows/cloud/VMs



DOIs are cheap*, data is precious: maximise its use
 * ish
Adding “value” publishing data




DOIs are cheap*, data is precious: maximise its use
 * ish   Source: Ross Mounce CC-BY http://rossmounce.co.uk/2012/09/04/the-gold-oa-plot-v0-2/
Thanks to:               Shaoguang Liang (BGI-SZ)
Laurie Goodman           Tin-Lap Lee (CUHK)
Tam Sneddon              Huayen Gao (CUHK)
Nicole Nogoy             Qiong Luo (HKUST)
Alexandra Basford        Senghong Wang (HKUST)
Peter Li                 Yan Zhou (HKUST)
Jesse Si Zhe             Cogini
                           editorial@gigasciencejournal.com
Contact us:                database@gigasciencejournal.com

                            @gigascience

 Follow us:                 facebook.com/GigaScience

                            blogs.openaccesscentral.com/blogs/gigablog/

                   www.gigadb.org
              www.gigasciencejournal.com

Más contenido relacionado

La actualidad más candente

GigaScience: data and beta-database launch. Announcing GigaDB
GigaScience: data and beta-database launch. Announcing GigaDBGigaScience: data and beta-database launch. Announcing GigaDB
GigaScience: data and beta-database launch. Announcing GigaDBGigaScience, BGI Hong Kong
 
Knowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, BonnKnowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, BonnTodd Vision
 
SemanticCampLondon, 16th February 2008
SemanticCampLondon, 16th February 2008SemanticCampLondon, 16th February 2008
SemanticCampLondon, 16th February 2008Andrew Walkingshaw
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8Scott Edmunds
 
The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...Todd Vision
 
Data reuse and scholarly reward: understanding practice and building infrastr...
Data reuse and scholarly reward: understanding practice and building infrastr...Data reuse and scholarly reward: understanding practice and building infrastr...
Data reuse and scholarly reward: understanding practice and building infrastr...Todd Vision
 
Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck Todd Vision
 
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...GigaScience, BGI Hong Kong
 
ContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data SeminarContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data SeminarJenny Molloy
 
Peter Li: GigaDB and Galaxy - revolutionizing data dissemination, organizatio...
Peter Li: GigaDB and Galaxy - revolutionizing data dissemination, organizatio...Peter Li: GigaDB and Galaxy - revolutionizing data dissemination, organizatio...
Peter Li: GigaDB and Galaxy - revolutionizing data dissemination, organizatio...GigaScience, BGI Hong Kong
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the partsCarole Goble
 
ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika! ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika! TheContentMine
 
Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giantsBenjamin Good
 
Content Mining of Science and Medicine
Content Mining of Science and MedicineContent Mining of Science and Medicine
Content Mining of Science and MedicineTheContentMine
 
The Role of Libraries in Data Management and Curation
The Role of Libraries in Data Management and CurationThe Role of Libraries in Data Management and Curation
The Role of Libraries in Data Management and CurationNicole Vasilevsky
 

La actualidad más candente (20)

GigaScience: data and beta-database launch. Announcing GigaDB
GigaScience: data and beta-database launch. Announcing GigaDBGigaScience: data and beta-database launch. Announcing GigaDB
GigaScience: data and beta-database launch. Announcing GigaDB
 
Knowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, BonnKnowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, Bonn
 
SemanticCampLondon, 16th February 2008
SemanticCampLondon, 16th February 2008SemanticCampLondon, 16th February 2008
SemanticCampLondon, 16th February 2008
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 
The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...
 
Data reuse and scholarly reward: understanding practice and building infrastr...
Data reuse and scholarly reward: understanding practice and building infrastr...Data reuse and scholarly reward: understanding practice and building infrastr...
Data reuse and scholarly reward: understanding practice and building infrastr...
 
Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck
 
NISO Forum, Denver, Sept. 24, 2012: EZID: Easy dataset identification & manag...
NISO Forum, Denver, Sept. 24, 2012: EZID: Easy dataset identification & manag...NISO Forum, Denver, Sept. 24, 2012: EZID: Easy dataset identification & manag...
NISO Forum, Denver, Sept. 24, 2012: EZID: Easy dataset identification & manag...
 
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...
 
NISO Forum, Denver, Sept. 24, 2012: Data Equivalence
NISO Forum, Denver, Sept. 24, 2012: Data EquivalenceNISO Forum, Denver, Sept. 24, 2012: Data Equivalence
NISO Forum, Denver, Sept. 24, 2012: Data Equivalence
 
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
 
ContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data SeminarContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data Seminar
 
Peter Li: GigaDB and Galaxy - revolutionizing data dissemination, organizatio...
Peter Li: GigaDB and Galaxy - revolutionizing data dissemination, organizatio...Peter Li: GigaDB and Galaxy - revolutionizing data dissemination, organizatio...
Peter Li: GigaDB and Galaxy - revolutionizing data dissemination, organizatio...
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the parts
 
FAIRy Stories
FAIRy StoriesFAIRy Stories
FAIRy Stories
 
ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika! ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!
 
Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giants
 
Content Mining of Science and Medicine
Content Mining of Science and MedicineContent Mining of Science and Medicine
Content Mining of Science and Medicine
 
The Role of Libraries in Data Management and Curation
The Role of Libraries in Data Management and CurationThe Role of Libraries in Data Management and Curation
The Role of Libraries in Data Management and Curation
 
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
 

Destacado

ScienceSoft: Open Software for Open Science
ScienceSoft: Open Software for Open ScienceScienceSoft: Open Software for Open Science
ScienceSoft: Open Software for Open ScienceSoftwarePractice
 
Radiation Monitoring Data from Fukushima Area March 22, 2011
Radiation Monitoring Data from Fukushima Area March 22, 2011Radiation Monitoring Data from Fukushima Area March 22, 2011
Radiation Monitoring Data from Fukushima Area March 22, 2011US Department of Energy
 
Maintainable Software Practices for e-Science - Introduction to Workshop
Maintainable Software Practices for e-Science - Introduction to WorkshopMaintainable Software Practices for e-Science - Introduction to Workshop
Maintainable Software Practices for e-Science - Introduction to WorkshopSoftwarePractice
 
Overview of the TriBITS Lifecycle Model
Overview of the TriBITS Lifecycle ModelOverview of the TriBITS Lifecycle Model
Overview of the TriBITS Lifecycle ModelSoftwarePractice
 
libHPC: Software sustainability and reuse through metadata preservation
libHPC: Software sustainability and reuse through metadata preservationlibHPC: Software sustainability and reuse through metadata preservation
libHPC: Software sustainability and reuse through metadata preservationSoftwarePractice
 
Software Practice 12 breakout - Tracking usage and impact of software
Software Practice 12 breakout - Tracking usage and impact of softwareSoftware Practice 12 breakout - Tracking usage and impact of software
Software Practice 12 breakout - Tracking usage and impact of softwareSoftwarePractice
 

Destacado (7)

Scott Edmunds: Hong Kong Open Access Update
Scott Edmunds: Hong Kong Open Access UpdateScott Edmunds: Hong Kong Open Access Update
Scott Edmunds: Hong Kong Open Access Update
 
ScienceSoft: Open Software for Open Science
ScienceSoft: Open Software for Open ScienceScienceSoft: Open Software for Open Science
ScienceSoft: Open Software for Open Science
 
Radiation Monitoring Data from Fukushima Area March 22, 2011
Radiation Monitoring Data from Fukushima Area March 22, 2011Radiation Monitoring Data from Fukushima Area March 22, 2011
Radiation Monitoring Data from Fukushima Area March 22, 2011
 
Maintainable Software Practices for e-Science - Introduction to Workshop
Maintainable Software Practices for e-Science - Introduction to WorkshopMaintainable Software Practices for e-Science - Introduction to Workshop
Maintainable Software Practices for e-Science - Introduction to Workshop
 
Overview of the TriBITS Lifecycle Model
Overview of the TriBITS Lifecycle ModelOverview of the TriBITS Lifecycle Model
Overview of the TriBITS Lifecycle Model
 
libHPC: Software sustainability and reuse through metadata preservation
libHPC: Software sustainability and reuse through metadata preservationlibHPC: Software sustainability and reuse through metadata preservation
libHPC: Software sustainability and reuse through metadata preservation
 
Software Practice 12 breakout - Tracking usage and impact of software
Software Practice 12 breakout - Tracking usage and impact of softwareSoftware Practice 12 breakout - Tracking usage and impact of software
Software Practice 12 breakout - Tracking usage and impact of software
 

Similar a Scott Edmunds: Data publication in the data deluge

Scott Edmunds at DataCite 2012: Adventures in Data Citation
Scott Edmunds at DataCite 2012: Adventures in Data CitationScott Edmunds at DataCite 2012: Adventures in Data Citation
Scott Edmunds at DataCite 2012: Adventures in Data CitationGigaScience, BGI Hong Kong
 
CLIR Fellows - Science Data - 14_0730
CLIR Fellows - Science Data - 14_0730CLIR Fellows - Science Data - 14_0730
CLIR Fellows - Science Data - 14_0730jeffreylancaster
 
Publishing of Scientific Data - Science Foundation Ireland Summit 2010
Publishing of Scientific Data  - Science Foundation Ireland Summit 2010Publishing of Scientific Data  - Science Foundation Ireland Summit 2010
Publishing of Scientific Data - Science Foundation Ireland Summit 2010jodischneider
 
GigaScience: a new resource for the big-data community.
GigaScience: a new resource for the big-data community.GigaScience: a new resource for the big-data community.
GigaScience: a new resource for the big-data community.GigaScience, BGI Hong Kong
 
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...GigaScience, BGI Hong Kong
 
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...Scott Edmunds
 
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...GigaScience, BGI Hong Kong
 
IASSIST identifiers By Joan Starr
IASSIST identifiers By Joan StarrIASSIST identifiers By Joan Starr
IASSIST identifiers By Joan StarrCarly Strasser
 
DataCite: the Perfect Complement to CrossRef
DataCite: the Perfect Complement to CrossRefDataCite: the Perfect Complement to CrossRef
DataCite: the Perfect Complement to CrossRefCrossref
 
Preserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of ScholarshipPreserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of Scholarshiptsbbbu
 
Scott Edmunds: GigaScience Datacite meeting Rapid Fire Talk
Scott Edmunds: GigaScience Datacite meeting Rapid Fire TalkScott Edmunds: GigaScience Datacite meeting Rapid Fire Talk
Scott Edmunds: GigaScience Datacite meeting Rapid Fire TalkGigaScience, BGI Hong Kong
 
Data101 pmcb retreat_09-20-13_final
Data101 pmcb retreat_09-20-13_finalData101 pmcb retreat_09-20-13_final
Data101 pmcb retreat_09-20-13_finalJackie Wirz, PhD
 
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirShare and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirSpark Summit
 
Sla2009 D Curation Heidorn
Sla2009 D Curation HeidornSla2009 D Curation Heidorn
Sla2009 D Curation HeidornBryan Heidorn
 
TDWG_2010_Chavan_data_citation
TDWG_2010_Chavan_data_citationTDWG_2010_Chavan_data_citation
TDWG_2010_Chavan_data_citationVishwas Chavan
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsmikaelhuss
 
From Deadly E. coli to Endangered Polar Bear: GigaScience Provides First Cita...
From Deadly E. coli to Endangered Polar Bear: GigaScience Provides First Cita...From Deadly E. coli to Endangered Polar Bear: GigaScience Provides First Cita...
From Deadly E. coli to Endangered Polar Bear: GigaScience Provides First Cita...GigaScience, BGI Hong Kong
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Anita de Waard
 

Similar a Scott Edmunds: Data publication in the data deluge (20)

Scott Edmunds at DataCite 2012: Adventures in Data Citation
Scott Edmunds at DataCite 2012: Adventures in Data CitationScott Edmunds at DataCite 2012: Adventures in Data Citation
Scott Edmunds at DataCite 2012: Adventures in Data Citation
 
CLIR Fellows - Science Data - 14_0730
CLIR Fellows - Science Data - 14_0730CLIR Fellows - Science Data - 14_0730
CLIR Fellows - Science Data - 14_0730
 
Publishing of Scientific Data - Science Foundation Ireland Summit 2010
Publishing of Scientific Data  - Science Foundation Ireland Summit 2010Publishing of Scientific Data  - Science Foundation Ireland Summit 2010
Publishing of Scientific Data - Science Foundation Ireland Summit 2010
 
GigaScience: a new resource for the big-data community.
GigaScience: a new resource for the big-data community.GigaScience: a new resource for the big-data community.
GigaScience: a new resource for the big-data community.
 
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
 
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
 
Dataset Metadata, Tools and Approaches for Access and Preservation
Dataset Metadata, Tools and Approaches for Access and PreservationDataset Metadata, Tools and Approaches for Access and Preservation
Dataset Metadata, Tools and Approaches for Access and Preservation
 
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
 
Nicole Nogoy at the Auckland BMC RoadShow
Nicole Nogoy at the Auckland BMC RoadShowNicole Nogoy at the Auckland BMC RoadShow
Nicole Nogoy at the Auckland BMC RoadShow
 
IASSIST identifiers By Joan Starr
IASSIST identifiers By Joan StarrIASSIST identifiers By Joan Starr
IASSIST identifiers By Joan Starr
 
DataCite: the Perfect Complement to CrossRef
DataCite: the Perfect Complement to CrossRefDataCite: the Perfect Complement to CrossRef
DataCite: the Perfect Complement to CrossRef
 
Preserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of ScholarshipPreserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of Scholarship
 
Scott Edmunds: GigaScience Datacite meeting Rapid Fire Talk
Scott Edmunds: GigaScience Datacite meeting Rapid Fire TalkScott Edmunds: GigaScience Datacite meeting Rapid Fire Talk
Scott Edmunds: GigaScience Datacite meeting Rapid Fire Talk
 
Data101 pmcb retreat_09-20-13_final
Data101 pmcb retreat_09-20-13_finalData101 pmcb retreat_09-20-13_final
Data101 pmcb retreat_09-20-13_final
 
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirShare and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
 
Sla2009 D Curation Heidorn
Sla2009 D Curation HeidornSla2009 D Curation Heidorn
Sla2009 D Curation Heidorn
 
TDWG_2010_Chavan_data_citation
TDWG_2010_Chavan_data_citationTDWG_2010_Chavan_data_citation
TDWG_2010_Chavan_data_citation
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
 
From Deadly E. coli to Endangered Polar Bear: GigaScience Provides First Cita...
From Deadly E. coli to Endangered Polar Bear: GigaScience Provides First Cita...From Deadly E. coli to Endangered Polar Bear: GigaScience Provides First Cita...
From Deadly E. coli to Endangered Polar Bear: GigaScience Provides First Cita...
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013
 

Más de GigaScience, BGI Hong Kong

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...GigaScience, BGI Hong Kong
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteGigaScience, BGI Hong Kong
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...GigaScience, BGI Hong Kong
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...GigaScience, BGI Hong Kong
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...GigaScience, BGI Hong Kong
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...GigaScience, BGI Hong Kong
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...GigaScience, BGI Hong Kong
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...GigaScience, BGI Hong Kong
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...GigaScience, BGI Hong Kong
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixGigaScience, BGI Hong Kong
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserGigaScience, BGI Hong Kong
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...GigaScience, BGI Hong Kong
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceGigaScience, BGI Hong Kong
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...GigaScience, BGI Hong Kong
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...GigaScience, BGI Hong Kong
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveGigaScience, BGI Hong Kong
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...GigaScience, BGI Hong Kong
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...GigaScience, BGI Hong Kong
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...GigaScience, BGI Hong Kong
 

Más de GigaScience, BGI Hong Kong (20)

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByte
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...
 
Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
 

Último

Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Último (20)

Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

Scott Edmunds: Data publication in the data deluge

  • 1. Data publication in the data deluge Scott Edmunds, GigaScience/BGI Hong Kong COASP 2012, Budapest, 20th September 2012 www.gigasciencejournal.com
  • 2. The Data Challenge: •1.2 zettabytes (1021) electronic data generated globally each year •>Exponential growth of genomics data (& growth in imaging and MS data following) Source: http://www.genome.gov/sequencingcosts/ (with apologies) •Issues with reproducibility, hosting, curation, interoperability •Need for better incentives to overcome these Source: 1. Mervis J. U.S. science policy. Agencies rally to tackle big data. Science. 2012 Apr 6;336(6077):22.
  • 3. Large-Scale Data Journal/Database In conjunction with: Editor-in-Chief: Laurie Goodman, PhD Editor: Scott Edmunds, PhD Commisioning Editor: Nicole Nogoy, PhD Lead Curator: Tam Sneddon D.Phil Data Platform: Peter Li, PhD www.gigasciencejournal.com
  • 4. GigaDB is a new database integrated with the GigaScience journal to meet the needs of a new generation of biological and biomedical research as it enters the era of “big-data”… (see more)
  • 5. GDSAP: Genomic Data Submission and Analytical platform
  • 6. Anatomy of a Publication Idea Study Metadata Data Analysis Answer
  • 7. Anatomy of a Data Publication Idea Study Metadata Data Analysis Answer
  • 8. Issues for Data Publication Idea Cultural issues: Study Technical issues: Metadata Data Analysis Answer
  • 9. Issues for Data Publication Idea Cultural issues: Study Metadata Data Analysis Adoption held back by: Answer journal policies, citation, tracking… * T-Shirts available from Graham Steel / http://www.zazzle.co.uk/steelgraham
  • 10. Issues for Data Publication Idea Study Technical issues: Metadata What do we do with the data? Data Analysis Answer
  • 11. Issues for Data Publication Idea Study Technical issues: Metadata What do we do with the data? Data Lightweight: Analysis •Metadata only journals •Get someone else to host Heavyweight: Answer •Become a repository
  • 12. To host or not to host? Against: supplementary files argument The Journal of Neuroscience Average size of a Journal of Neuroscience article and supplemental material in megabytes. Announcement Regarding Supplemental Material: Beginning November 1, 2010, The Journal of Neuroscience will no longer allow authors to include supplemental material when they submit new manuscripts and will no longer host supplemental material on its web site for those articles. “While the size of articles has grown gradually over the past decade, the supplemental material associated with a typical Journal article appears to be growing exponentially and is rapidly approaching the size of an article. The sheer volume of supplemental material is adversely affecting peer review.” Maunsell J J. Neurosci. 2010;30:10599-10600
  • 13. $1000 genome = million $ peer-review? To review: (>6TBp, >1500 datasets) S3 (storage) = $15,000 EC2 (analysis w/ BLASTx) = $500,000 Source: Folker Meyer/Wilkening et al. 2009, CLUSTER'09. IEEE International Conference on Cluster Computing and Workshops
  • 14. $1000 genome = million $ peer-review? To review: (>6TBp, >1500 datasets) S3 (storage) = $15,000 EC2 (analysis w/ BLASTx) = $500,000 Source: Folker Meyer/Wilkening et al. 2009, CLUSTER'09. IEEE International Conference on Cluster Computing and Workshops ENCODE analysis Virtual Machine: Containing: input data, code bundles with scripts and processing steps, outputs AWS = ~$5,000 Source: James Taylor / http://encodeproject.org/ENCODE/integrativeAnalysis/VM
  • 15. To host or not to host? For: reproducibility The Guardian, 14th September 2012: Replication is the only solution to scientific fraud. http://www.guardian.co.uk/commentisfree/2012/sep/14/solution-scientific-fraud-replication For: “data is the new oil” William Gibson: "Information is the currency of the future world” Sir Tim Berners-Lee: "Data is a precious thing and will last longer than the systems themselves” Move compute to the data: think EC2 rather than S3 DNA Nexus + 0.5PB SRA data = $15 million given by Google Source:DNA Nexus/SRA http://techcrunch.com/2011/10/12/dnanexus-raises-15-million-teams-with-google-to-host-massive-dna-database/
  • 17. Overcoming cultural hurdles… Adventures in Data Citation doi:10.5524/100001
  • 18. For data citation to work, needs: 1. Proven utility/potential user base. 2. Acceptance/inclusion by journals. 3. Data+Citation: inclusion in the references. 4. Tracking by citation indexes. 5. Usage of the metrics by the community…
  • 19. Datacitation 1: utility/user base. Establishment of data DOIs and use by databases: Shackleton NJ, Hall MA, Vincent E (2001): Mean stable carbon isotope ratios of Cibicidoides wuellerstorfi from sediment core MD95-2042 on the Iberian margin, North Atlantic. PANGAEA - Data Publisher for Earth & Environmental Science. http://doi.pangaea.de/10.1594/PANGAEA.58229 Cited in: Pahnke K, Zahn R: Southern Hemisphere Water Mass Conversion Linked with North Atlantic Climate Variability. Science 2005, 307:1741 -1746. Nocek B, Xu X, Savchenko A, Edwards A, Joachimiak A. 2007. PDB ID: 2P06 Crystal structure of a predicted coding region AF_0060 from Archaeoglobus fulgidus DSM 4304. 10.2210/pdb2p06/pdb. Cited in: Andreeva A, Howorth D, Chandonia J-M, Brenner SE, Hubbard TJP, Chothia C, Murzin AG: Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res. 2008, 36:D419-425.
  • 20. BGI Datasets Get DOI®s Invertebrate Released pre-publication Ant Paper Published in GigaScience - Florida carpenter ant Microbe - Jerdon’s jumping ant Vertebrates E. Coli O104:H4 TY-2482 - Leaf-cutter ant Darwin’s Finch T2D gut metagenome Roundworm Giant panda Macaque Schistosoma -Chinese rhesus Cell-Lines Silkworm -Crab-eating Chinese Hamster Ovary Mini-Pig Mouse methylomes Human Naked mole rat Asian individual (YH) Parrot, Puerto Rican PLANTS - DNA Methylome Penguin Chinese cabbage - Genome Assembly - Emperor penguin Cucumber - Transcriptome - Adelie penguin Foxtail millet Cancer (14TB) Pigeon, domestic Pigeonpea Single cell bladder cancer Polar bear Potato HBV infected exomes Sheep Sorghum Ancient DNA Tibetan antelope - Saqqaq Eskimo - Aboriginal Australian
  • 21. Our first DOI: To maximize its utility to the research community and aid those fighting the current epidemic, genomic data is released here into the public domain under a CC0 license. Until the publication of research papers on the assembly and whole-genome analysis of this isolate we would ask you to cite this dataset as: Li, D; Xi, F; Zhao, M; Liang, Y; Chen, W; Cao, S; Xu, R; Wang, G; Wang, J; Zhang, Z; Li, Y; Cui, Y; Chang, C; Cui, C; Luo, Y; Qin, J; Li, S; Li, J; Peng, Y; Pu, F; Sun, Y; Chen,Y; Zong, Y; Ma, X; Yang, X; Cen, Z; Zhao, X; Chen, F; Yin, X; Song,Y ; Rohde, H; Li, Y; Wang, J; Wang, J and the Escherichia coli O104:H4 TY-2482 isolate genome sequencing consortium (2011) Genomic data from Escherichia coli O104:H4 isolate TY-2482. BGI Shenzhen. doi:10.5524/100001 http://dx.doi.org/10.5524/100001 To the extent possible under law, BGI Shenzhen has waived all copyright and related or neighboring rights to Genomic Data from the 2011 E. coli outbreak. This work is published from: China.
  • 22.
  • 23.
  • 24.
  • 25. 1.3 The power of intelligently open data The benefits of intelligently open data were powerfully illustrated by events following an outbreak of a severe gastro- intestinal infection in Hamburg in Germany in May 2011. This spread through several European countries and the US, affecting about 4000 people and resulting in over 50 deaths. All tested positive for an unusual and little-known Shiga-toxin– producing E. coli bacterium. The strain was initially analysed by scientists at BGI-Shenzhen in China, working together with those in Hamburg, and three days later a draft genome was released under an open data licence. This generated interest from bioinformaticians on four continents. 24 hours after the release of the genome it had been assembled. Within a week two dozen reports had been filed on an open-source site dedicated to the analysis of the strain. These analyses provided crucial information about the strain’s virulence and resistance genes – how it spreads and which antibiotics are effective against it. They produced results in time to help contain the outbreak. By July 2011, scientists published papers based on this work. By opening up their early sequencing results to international collaboration, researchers in Hamburg produced results that were quickly tested by a wide range of experts, used to produce new knowledge and ultimately to control a public health emergency.
  • 26. Data Citation 2: acceptance by journals
  • 27. Data Citation 2: acceptance by journals
  • 28. Data+Citation 3: inclusion in the references
  • 29.
  • 31. Is the DOI… * Certain types of genomics data must also be deposited in INSDC databases (SRA & Genbank).
  • 32. And in more journals… Hodkinson BP, Uehling JK, Smith ME (2012) Data from: Lepidostroma vilgalysii, a new basidiolichen from the New World. Dryad Digital Repository. doi:10.5061/dryad.j1g5dh23 Cited in: Hodkinson BP, Uehling JK, Smith ME: Lepidostroma vilgalysii, a new basidiolichen from the New World. Mycological Progress 2012. Advance Online Publication. Roberts SB (2012) Herring Hepatic Transcriptome 34300 contigs.fa. Figshare. Available: hdl.handle.net/10779/084d34370fbda29bbc6​7b3c5ecb02 575. Accessed 2012 Jan 20. Cited in: Roberts SB, Hauser L, Seeb LW, Seeb JE (2012) Development of Genomic Resources for Pacific Herring through Targeted Transcriptome Pyrosequencing. PLoS ONE 7(2): e30908. doi:10.1371/journal.pone.0030908
  • 33. For data citation to work, needs: 1. Proven utility/potential user base. ✔ 2. Acceptance/inclusion by journals. ✔ 3. Data+Citation: inclusion in the references. ✔ 4. Tracking by citation indexes. 5. Usage of the metrics by the community…
  • 35. Datacitation 4: tracking? ✗FAIL DataCite metadata in harvestable form (OAI-PMH) - lists some DataCite DOIs, but says: Datasets listed are the “result of approximations in the indexing algorithms.” “Google Scholar's intended coverage is for scholarly articles. At this point, we don't include datasets. “
  • 36. Datacitation 4: tracking? ✗FAIL DataCite metadata in harvestable form (OAI-PMH) ✗ Working on it. Coming soon…
  • 37.
  • 38. Datacitation 5: metrics? “As a result of diverse practices and tool limitations, data citations are currently very difficult to track.”
  • 39. Datacitation 5: metrics? ✗FAIL Research Remix, 29th May 2012: http://researchremix.wordpress.com/2012/05/29/dear-research- data-advocate-please-sign-the-petition-oamonday/ “I’m afraid we are making promises to data creators about attribution and reward that we can’t keep. ”Make your data citeable!” is the cry. OK. So citeable is step one. Cited is step two. But for the citation to be useful, it has to be indexed so that citation metrics can be tracked and admired and used. Who is indexing data citations right now? As far as I can tell: absolutely no one.”
  • 40. Where data citation is in 2012: 1. Proven utility/potential user base. ✔ 2. Acceptance/inclusion by journals. ✔ 3. Data+Citation: inclusion in the references. ✔ 4. Tracking by citation indexes. ✔/✗ 5. Usage of the metrics by the community… ✗
  • 42. Addressing the reproducibility gap: Computable methods/workflow systems Bioinformatics Development Biomedical and bioinformatics research Publishing
  • 43. Redefining what is a paper in the era of big-data? goal: Executable Research Objects Citable DOI
  • 44. Publication • Background • Methods • Results (Data) • Conclusions/Discussion doi:10.1186/2047-217X-1-3
  • 45. Data Publication • Background • Methods • Results (Data) doi:10.5524/100035 • Conclusions/Discussion doi:10.1186/2047-217X-1-3
  • 46. Methods + Data + Publication • Background • Methods Doi for workflows? • Results (Data) doi:10.5524/100035 • Conclusions/Discussion doi:10.1186/2047-217X-1-3
  • 47. Data Methods Analysis doi:10.5524/100035 + DOI: x = doi:10.1186/2047-217X-1-3 DOI: A + DOI: X = DOI: 1
  • 48. Data Methods Analysis doi:10.5524/100035 + DOI: x = doi:10.1186/2047-217X-1-3 DOI: A + DOI: X = DOI: 1 DOI: B + DOI: X = DOI: 2
  • 49. Data Methods Analysis doi:10.5524/100035 + DOI: x = doi:10.1186/2047-217X-1-3 DOI: A + DOI: X = DOI: 1 DOI: B + DOI: X = DOI: 2 DOI: A + DOI: Y = DOI: 3
  • 50. Data Methods Analysis doi:10.5524/100035 + DOI: x = doi:10.1186/2047-217X-1-3 DOI: A + DOI: X = DOI: 1 DOI: B + DOI: X = DOI: 2 DOI: A + DOI: Y = DOI: 3 A, B, C… X, Y, Z… = 4, 5, 6…
  • 51. Different shaped publishable objects Data Papers Executable (Methods) Papers Analysis Papers
  • 52. Different shaped publishable objects Different levels of granularity Experiment e.g. doi:10.5524/100001 Papers (e.g. ACRG project) e.g. doi:10.5524/100001-2 Data/ Datasets Micropubs (e.g. cancer type) e.g. doi:10.5524/100001-2000 Sample or doi:10.5524/100001_xyz (e.g. specimen xyz) Smaller still? Facts/Assertions (~1013 in literature) Nanopubs
  • 53. Adding “value” publishing data • Scope for different shaped publishable objects • Scope for publishing methods/executable papers • Peer review of data problematic – Post publication peer review – Change criteria (assess on transparency/access only) – Better use of workflows/cloud/VMs DOIs are cheap*, data is precious: maximise its use * ish
  • 54. Adding “value” publishing data DOIs are cheap*, data is precious: maximise its use * ish Source: Ross Mounce CC-BY http://rossmounce.co.uk/2012/09/04/the-gold-oa-plot-v0-2/
  • 55. Thanks to: Shaoguang Liang (BGI-SZ) Laurie Goodman Tin-Lap Lee (CUHK) Tam Sneddon Huayen Gao (CUHK) Nicole Nogoy Qiong Luo (HKUST) Alexandra Basford Senghong Wang (HKUST) Peter Li Yan Zhou (HKUST) Jesse Si Zhe Cogini editorial@gigasciencejournal.com Contact us: database@gigasciencejournal.com @gigascience Follow us: facebook.com/GigaScience blogs.openaccesscentral.com/blogs/gigablog/ www.gigadb.org www.gigasciencejournal.com

Notas del editor

  1. and an advanced search option…
  2. Raw data has been submitted to the SRA, the assembly submitted to GenBank (no number), SV data to dbVar (it’s the first plant data they’ve received). Complements the traditional public databases by having all these “extra” data types, it’s all in one place, and it’s citable.
  3. Raw data has been submitted to the SRA, the assembly submitted to GenBank (no number), SV data to dbVar (it’s the first plant data they’ve received). Complements the traditional public databases by having all these “extra” data types, it’s all in one place, and it’s citable.
  4. Raw data has been submitted to the SRA, the assembly submitted to GenBank (no number), SV data to dbVar (it’s the first plant data they’ve received). Complements the traditional public databases by having all these “extra” data types, it’s all in one place, and it’s citable.
  5. Leading on from that, current and future plans include collaborating with Tin-Lap Lee at the Chinese University of Hong Kong to integrate an instance of the Galaxy bioinformatics platform with GigaDB so users can make full use of the data in GigaDB by linking it to other resources and we can incorporate fully executable papers. One such submission is a new SOAPdenovo pipeline. The SOAP tools have been wrapped in Galaxy, the workflow defined in MyExperiment and the data will be issued with a DOI and accessible via GigaDB. Utilizing the BGI cloud if necessary, users will then be able to reproduce all the steps described in the GigaScience paper to test, reanalyze, compare results etc.Since we would like GigaDB to be a host for data types that have no other home, such as imaging data, we are investigating adding other tools such as an image viewer and the like to support accessibility to and usability of the data. So, if you have a large-scale biological or biomedical dataset and/or a pipeline or software that you would like to submit to GigaScience we would love to hear from you so please come and talk to Scott or myself.
  6. That just leaves me to thank the GigaScience team: Laurie, Scott, Alexandra, Peter and Jesse, BGI for their support - specifically Shaoguang for IT and bioinformatics support – our collaborators on the database, website and tools: Tin-Lap, Qiong, Senhong, Yan, the Cogini web design team, Datacite for providing the DOI service and the isacommons team for their support and advocacy for best practice use of metadata reporting and sharing.Thank you for listening.