Scott Edmunds: Preparing a data paper for GigaByte

Preparing a data paper for
Scott Edmunds
scott@gigasciencejournal.com
25th January 2021
0000-0001-6444-1436

https://www.nytimes.com/2019/02/28/learning/teach-about-climate-change-with-these-24-new-york-times-graphs.html
https://www.nytimes.com/interactive/2020/03/21/upshot/coronavirus-deaths-by-country.html
Why do we need data papers?

Why we need to fill biodiversity data gaps
Expert predictions
of species richness
https://www.nature.com/articles/ncomms9221
Completeness of
biodiversity records

Past and current
malaria prevalence
https://en.wikipedia.org/wiki/Malaria#/media/File:World-map-of-past-and-current-malaria-
prevalence-world-development-report-2009.png
Completeness of
biodiversity records
Why we need to fill biodiversity data gaps

Why we need to go beyond open: FAIR
https://doi.org/10.1038/sdata.2016.18
https://www.ands.org.au/working-with-data/fairdata/training

Research data (+ software and underlying
methods) need to be shared for scrutiny and
re-use
Buckheit & Donoho: Scholarly articles
are merely advertisement of scholarship.
The actual scholarly artifacts, i.e. the
data and computational methods, which
support the scholarship, remain largely
inaccessible.
Researcher incentive systems have
not been aligned to this
Also need to be credited/tracked and treated as
first class research objects
…to 2009
From 1665…

How ‘data papers’ enhance FAIRness, visibility, accessibility
and provide credit.
Source: Dimitrova et al. https://doi.org/10.1093/gigascience/giab034

Journal policy & practice slowly catching up
https://f1000research.com/data-policies
In 2013

Rewarding open data: GigaScience
http://gigasciencejournal.com/
Launched July 2012, now partnering with OUP. Publishes “Data Notes” for CC0 data.
Published by:
2011-2016
2016-date

APC covers curation and 1TB of storage in our GigaDB repository
http://gigadb.org/
Rewarding open data: GigaScience
Since 2011,
and working
with

Rewarding open data, pt2:
Launched September 2020 to break barriers of speed, interactivity and cost
Published by:
https://gigabytejournal.com/

Data Publishing: nothing new…
Data & Metadata Collection/Experiments
Analysis/Hypothesis/Analysis
Conclusions
+ Area of Interest/Question
1839
1859
20 Yrs.

Technical features of
Main advantage of workflow is XML from start to end
Several modules acting as one platform: no
import/export of files, so fast and accurate
Cutting out production allows huge time & cost
saving (currently 4-8 hours per paper)
Any number of versions can be published instantly,
including typographic quality PDF
Allows instantaneous switch of views
Leverage embeddable dynamic content/widgets
Initial focus on forkable products: data + software +
updates

Advantages of GigaByte: interactive features
What does focusing on Data + software + XML allow us to do?
https://youtu.be/TVdKLtRGSYs

Thinking about users: authors, reviewers, readers
Streamlined questionnaire-based review
Reconfigured for short, easy to write & review data & software papers
Export as PDF, XML, HTML… “on the fly”
Links between
preprints and papers
(inc open review)

Thinking about users: authors, reviewers, readers
https://gigabytejournal.com/data-release-description
Data Release: a short, updatable, description of a research dataset
What does a GigaByte data paper look like?
Discoverability & credit: Highlights and help to
contextualize openly available datasets to
encourage reuse.
Sharing: All data can be linked to the Data Release
via GBIF, GigaDB or other data DOIs or accessions.
Data, not analysis: Incentivizes and allows more
rapid releases of data before subsequent detailed
analysis has been carried out. Or in coordination
with publication of an analysis paper.
Simple: Structure = Context, Methods, Data
Validation and QC, Reuse Potential, Data
Availability
Submit via:

Key to a data paper: Data Availability
Summary of where to find/access all the supporting data
Follows the Data Citation Principles (#CiteTheDOI)
Also collects together other accession numbers and
reporting checklists
https://www.force11.org/datacitationprinciples

The data papers submitted should describe
datasets with the following criteria:
• Data has clear relevance for research on vectors of
human vector-borne diseases
• Dataset contains more than 5,000 records that are
new to GBIF.org in 2021/22 with high-quality data
and metadata
• Data is dedicated to the public domain under an
open CC0 designation
Notes on the series

Data deposition is key, and supported by
GBIF helpdesk and GigaDB curators
• Authors should start by preparing the dataset and
publishing it through GBIF.org before writing
• Support from health@gbif.org for questions on
publishing data through GBIF, data standards, etc.
• GigaDB team (database@gigasciencejournal.com) on
hand to help with additional supporting data
• GigaDB curators will also help review process by
providing a data audit for each submission
Notes on the series

Thanks to TDR/WHO for support of this
datasets on vectors of human diseases series
Due to this very generous sponsorship the
article processing fee (normally $350 USD)
will be waived for the first 15 papers that are
accepted and meet the series criteria.

All authors will be part of a collaborative
follow-up commentary in GigaScience
https://doi.org/10.1186/s13742-016-0121-x

Many thanks to our partners
For further questions contact: editorial@gigabytejournal.com
Submit now:
Questions?

Scott Edmunds: Preparing a data paper for GigaByte

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Scott Edmunds: Preparing a data paper for GigaByte

Similar a Scott Edmunds: Preparing a data paper for GigaByte (20)

Más de GigaScience, BGI Hong Kong

Más de GigaScience, BGI Hong Kong (20)

Último

Último (20)

Scott Edmunds: Preparing a data paper for GigaByte