GigaByte Chief Editor Scott Edmunds presents on how to prepare a data paper for the TDR and WHO sponsored call for data papers describing datasets on vectors of human diseases launched in Nov 2021. Presented at the GBIF webinar on 25th January 2022 and aimed at authors interested in submitting a manuscript submitted to the series.
3. Why we need to fill biodiversity data gaps
Expert predictions
of species richness
https://www.nature.com/articles/ncomms9221
Completeness of
biodiversity records
4. Past and current
malaria prevalence
https://en.wikipedia.org/wiki/Malaria#/media/File:World-map-of-past-and-current-malaria-
prevalence-world-development-report-2009.png
Completeness of
biodiversity records
Why we need to fill biodiversity data gaps
5. Why we need to go beyond open: FAIR
https://doi.org/10.1038/sdata.2016.18
https://www.ands.org.au/working-with-data/fairdata/training
6. Research data (+ software and underlying
methods) need to be shared for scrutiny and
re-use
Buckheit & Donoho: Scholarly articles
are merely advertisement of scholarship.
The actual scholarly artifacts, i.e. the
data and computational methods, which
support the scholarship, remain largely
inaccessible.
Researcher incentive systems have
not been aligned to this
Also need to be credited/tracked and treated as
first class research objects
…to 2009
From 1665…
7. How ‘data papers’ enhance FAIRness, visibility, accessibility
and provide credit.
Source: Dimitrova et al. https://doi.org/10.1093/gigascience/giab034
8. Journal policy & practice slowly catching up
https://f1000research.com/data-policies
In 2013
9. Rewarding open data: GigaScience
http://gigasciencejournal.com/
Launched July 2012, now partnering with OUP. Publishes “Data Notes” for CC0 data.
Published by:
2011-2016
2016-date
10. APC covers curation and 1TB of storage in our GigaDB repository
http://gigadb.org/
Rewarding open data: GigaScience
Since 2011,
and working
with
11. Rewarding open data, pt2:
Launched September 2020 to break barriers of speed, interactivity and cost
Published by:
https://gigabytejournal.com/
12. Data Publishing: nothing new…
Data & Metadata Collection/Experiments
Analysis/Hypothesis/Analysis
Conclusions
+ Area of Interest/Question
1839
1859
20 Yrs.
13. Technical features of
Main advantage of workflow is XML from start to end
https://gigabytejournal.com/
Several modules acting as one platform: no
import/export of files, so fast and accurate
Cutting out production allows huge time & cost
saving (currently 4-8 hours per paper)
Any number of versions can be published instantly,
including typographic quality PDF
Allows instantaneous switch of views
Leverage embeddable dynamic content/widgets
Initial focus on forkable products: data + software +
updates
14. Advantages of GigaByte: interactive features
What does focusing on Data + software + XML allow us to do?
https://youtu.be/TVdKLtRGSYs
15. Thinking about users: authors, reviewers, readers
https://gigabytejournal.com/
Streamlined questionnaire-based review
Reconfigured for short, easy to write & review data & software papers
Export as PDF, XML, HTML… “on the fly”
Links between
preprints and papers
(inc open review)
16. Thinking about users: authors, reviewers, readers
https://gigabytejournal.com/data-release-description
Data Release: a short, updatable, description of a research dataset
What does a GigaByte data paper look like?
Discoverability & credit: Highlights and help to
contextualize openly available datasets to
encourage reuse.
Sharing: All data can be linked to the Data Release
via GBIF, GigaDB or other data DOIs or accessions.
Data, not analysis: Incentivizes and allows more
rapid releases of data before subsequent detailed
analysis has been carried out. Or in coordination
with publication of an analysis paper.
Simple: Structure = Context, Methods, Data
Validation and QC, Reuse Potential, Data
Availability
Submit via:
17. Key to a data paper: Data Availability
Summary of where to find/access all the supporting data
Follows the Data Citation Principles (#CiteTheDOI)
Also collects together other accession numbers and
reporting checklists
https://www.force11.org/datacitationprinciples
18. The data papers submitted should describe
datasets with the following criteria:
• Data has clear relevance for research on vectors of
human vector-borne diseases
• Dataset contains more than 5,000 records that are
new to GBIF.org in 2021/22 with high-quality data
and metadata
• Data is dedicated to the public domain under an
open CC0 designation
Notes on the series
19. Data deposition is key, and supported by
GBIF helpdesk and GigaDB curators
• Authors should start by preparing the dataset and
publishing it through GBIF.org before writing
• Support from health@gbif.org for questions on
publishing data through GBIF, data standards, etc.
• GigaDB team (database@gigasciencejournal.com) on
hand to help with additional supporting data
• GigaDB curators will also help review process by
providing a data audit for each submission
Notes on the series
20. Thanks to TDR/WHO for support of this
datasets on vectors of human diseases series
Due to this very generous sponsorship the
article processing fee (normally $350 USD)
will be waived for the first 15 papers that are
accepted and meet the series criteria.
21. All authors will be part of a collaborative
follow-up commentary in GigaScience
https://doi.org/10.1186/s13742-016-0121-x
22. Many thanks to our partners
For further questions contact: editorial@gigabytejournal.com
https://gigabytejournal.com/
Submit now:
Questions?