SlideShare a Scribd company logo
1 of 23
A decades experiences in transparent and interactive
publication of FAIR data and software via an end-to-end XML
publishing platform
Scott Edmunds 0000-0001-6444-1436
https://www.telegraph.co.uk/technology/2020/05/16/neil-fergusons-imperial-model-could-devastating-software-mistake/
Scientists: need to convince public + politicians
The “Infodemic” Era
Imperial College: Report 9
GigaSolution: rewarding open data & code
http://gigasciencejournal.com/
Publishes “Data Notes” for CC0 data, “Tech Notes” for OSI software.
Transparent: Open Peer Review and linked to preprints. Mandates code in repo.
Integrated GigaDB repository: DataCite DOIs, no size limits, code snapshots, APC covers curation
http://gigadb.org/
GigaSolution: rewarding open data & code
0 1 2
4
2
5 6 6
8
2
0
0
0
0
0
0
3 2
1
0
0
0
0
0
0
0
1
1
2
1
0
0
0
0
0
0
0
5 0
2
2 1
2
8
7
28
35
34
48
45
0
10
20
30
40
50
60
70
GigaScience software/workflow papers (Technical Notes), 2012-2021
Galaxy Snakemake Nextflow CWL Other
Changes in how research is shared: workflows
gigagalaxy.net
Experience publishing Galaxy workflows: 2013
https://doi.org/10.1186/2047-217X-3-23
• Downloadable as virtual hard-disk/available as Amazon Machine Image
• Unclear how to describe licensing & security issues?
Experience publishing VMs: 2014
https://doi.org/10.1186/s13742-015-0087-0
https://doi.org/10.1186/s13742-015-0073-6
• From 2015 increasing submissions leveraging containers
• Promoted experiments in standardization such as bioboxes
• Integrated with CodeOcean & tested with Gigantum
• Carried out reproducibility case-studies (can be expensive)
Experience publishing containers: 2015
Independent execution of computations underlying research articles.
Experience publishing CODECHECK: 2020
CODECHECK tackles one of the main challenges of computational research by supporting
codecheckers with a workflow, guidelines and tools to evaluate computer programs
underlying scientific papers. The independent time-stamped runs conducted by
codecheckers will award a “certificate of executable computation” and increase availability,
discovery and reproducibility of crucial artefacts for computational sciences.
https://codecheck.org.uk/
Experience publishing CODECHECK: 2020
http://gigasciencejournal.com/blog/codecheck-certificate/
https://doi.org/10.1093/gigascience/giaa026
Experience publishing CODECHECK: 2020
https://www.nature.com/articles/d41586-020-01685-y
http://doi.org/10.5281/zenodo.3865491
Tech really the
bottleneck
Process much too
slow & expensive
Still too focused on
narrative and static
“version of record”
Still not very FAIR
Lessons learned in a decade of data & software
publishing:
D ATA C O D E E N T I T I E S FA C T S S TA B I L I T Y
A new approach
Follow the Software
Paradigm?
C O D E R E L E A S E F O R K U P D AT E R E P E AT
Deconstruct the “Version
of Record”?
Move to new XML end-to-end pipeline
Custom end-to-end workflow makes integrations simpler with one integration point
Features of new journal:
Main advantage of workflow is XML from start to end
https://gigabytejournal.com/
Several modules acting as one platform: no
import/export of files, so fast and accurate
Cutting out production allows huge time & cost saving
(currently as little as 3.5hrs per paper)
Any number of versions can be published instantly,
including typographic quality PDF or updates/forks
Allows instantaneous switch of views
Leverage embeddable dynamic content/widgets
Initial focus on forkable open source products:
data + software + update papers
Focusing beyond VoR allows different views…
16
What does focusing on Data + software + XML allow us to do?
https://doi.org/10.46471/gigabyte.1
https://doi.org/10.46471/gigabyte.6
High quality rich XML
CC-BY open licensed, open citations, open corpus
Structured schema.org metadata
No hiding of material in supplemental files
Maximise use of persistent identifiers (PIDs)
Who
ORCID IDs
CASRAI contributorship
Funder (Fundref)
Institution (ROR)
What
Species (NCBI, fishbase)
Cell/strain (RRID)
How
Equipment (RRID)
Software (RRID, bio.tools)
Output
Data (accessions, DOIs)
Results (DOIs)
Helping to make research “AI-ready”
Thinking about users: machines
Interaction: increasing understanding & trust
https://doi.org/10.46471/gigabyte.13
Do you trust an immunoinformatics tool to predict whether memory T cells generated from
previous exposure to common cold coronaviruses are cross-reactive against SARSCoV2?
Interaction: software and code via Stencila and CodeOcean
http://gigasciencejournal.com/blog/gigabyte-executable-research-articles/
Code Ocean “Compute Capsule”: readers can
directly interact with software via an embedded
version in the article; or deploy and run in their
own cloud computing environment.
Popout Stencila “Executable Research Article”
where figures are accompanied by editable
code blocks that can be edited and re-
executed to immediately see the changes.
Interact with Stenci.la “code chunks” & Code Ocean “compute
capsules” of COVID-19 immunoinformatics papers
https://doi.org/10.46471/gigabyte.13
A new way of publishing FAIR research with new tech
• Share & get credit for updatable data & software papers
• Follow the software paradigm, bring your research to life
• XML makes it much easier to embed interactive content
• Use automation & interaction to increase scrutiny & trust
• XML only workflow cuts time and cost to publish
• Rethink “Version of Record”: focus on facts/data/code &
discard the packaging
Help us change scientific publishing, contact: editorial@gigabytejournal.com
https://gigabytejournal.com/
Thanks to:
@GigaByteJournal
facebook.com/GigaScience
http://gigasciencejournal.com/blog/
Follow us:
+
Weibo
& WeChat
Laurie Goodman, Publisher
Nicole Nogoy, Editor
Hans Zauner, Assistant Editor
Hongling Zhao, Assistant Editor
Peter Li, Head of IT
Chris Hunter, Lead BioCurator
Chris Armit, Data Scientist
Mary Ann Tulli, Data Editor
Rija Ménagé, Senior Software Engineer
Ken Cho, Systems Programmer Analyst
Chen Qi, Shenzhen Office.
https://gigabytejournal.com/
editorial@gigabytejournal.com
Questions?

More Related Content

Similar to IDW2022: A decades experiences in transparent and interactive publication of FAIR data and software via an end-to-end XML publishing platform

NI Trend Watch 2015
NI Trend Watch 2015NI Trend Watch 2015
NI Trend Watch 2015
Hank Lydick
 
Automated Test Outlook 2017
Automated Test Outlook 2017Automated Test Outlook 2017
Automated Test Outlook 2017
Hank Lydick
 
IOT SOLUTIONS FROM INTEL
IOT SOLUTIONS FROM INTELIOT SOLUTIONS FROM INTEL
IOT SOLUTIONS FROM INTEL
onebee kumar
 

Similar to IDW2022: A decades experiences in transparent and interactive publication of FAIR data and software via an end-to-end XML publishing platform (20)

Introduction to Data Models & Cisco's NextGen Device Level APIs: an overview
Introduction to Data Models & Cisco's NextGen Device Level APIs: an overviewIntroduction to Data Models & Cisco's NextGen Device Level APIs: an overview
Introduction to Data Models & Cisco's NextGen Device Level APIs: an overview
 
Inspector Gadget 2023 - CalCPA.pdf
Inspector Gadget 2023 - CalCPA.pdfInspector Gadget 2023 - CalCPA.pdf
Inspector Gadget 2023 - CalCPA.pdf
 
NI Trend Watch 2015
NI Trend Watch 2015NI Trend Watch 2015
NI Trend Watch 2015
 
Automated Test Outlook 2017
Automated Test Outlook 2017Automated Test Outlook 2017
Automated Test Outlook 2017
 
Programming IoT Gateways with macchina.io
Programming IoT Gateways with macchina.ioProgramming IoT Gateways with macchina.io
Programming IoT Gateways with macchina.io
 
Open Source Edge Computing Platforms - Overview
Open Source Edge Computing Platforms - OverviewOpen Source Edge Computing Platforms - Overview
Open Source Edge Computing Platforms - Overview
 
OpenPicus Keynote at Web of Things workshop 2012 in Newcastle
OpenPicus Keynote at Web of Things workshop 2012 in NewcastleOpenPicus Keynote at Web of Things workshop 2012 in Newcastle
OpenPicus Keynote at Web of Things workshop 2012 in Newcastle
 
Digital transformation and AI @Edge
Digital transformation and AI @EdgeDigital transformation and AI @Edge
Digital transformation and AI @Edge
 
IEEE Computer Society Phoenix Chapter - Internet of Things Innovations & Mega...
IEEE Computer Society Phoenix Chapter - Internet of Things Innovations & Mega...IEEE Computer Society Phoenix Chapter - Internet of Things Innovations & Mega...
IEEE Computer Society Phoenix Chapter - Internet of Things Innovations & Mega...
 
End-to-End Big Data AI with Analytics Zoo
End-to-End Big Data AI with Analytics ZooEnd-to-End Big Data AI with Analytics Zoo
End-to-End Big Data AI with Analytics Zoo
 
Digitizing your factory the open source way
Digitizing your factory the open source wayDigitizing your factory the open source way
Digitizing your factory the open source way
 
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
 
pythonOCC PDE2009 presentation
pythonOCC PDE2009 presentationpythonOCC PDE2009 presentation
pythonOCC PDE2009 presentation
 
IOT SOLUTIONS FROM INTEL
IOT SOLUTIONS FROM INTELIOT SOLUTIONS FROM INTEL
IOT SOLUTIONS FROM INTEL
 
Resume_Pratik
Resume_PratikResume_Pratik
Resume_Pratik
 
Node-RED Interoperability Test
Node-RED Interoperability TestNode-RED Interoperability Test
Node-RED Interoperability Test
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByte
 
OpenWhisk - Serverless Architecture
OpenWhisk - Serverless Architecture OpenWhisk - Serverless Architecture
OpenWhisk - Serverless Architecture
 
IoT Standardisation Panel
IoT Standardisation PanelIoT Standardisation Panel
IoT Standardisation Panel
 
IoT standardisation
IoT standardisationIoT standardisation
IoT standardisation
 

More from GigaScience, BGI Hong Kong

More from GigaScience, BGI Hong Kong (20)

Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...
 
Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
 
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...
 
Susanna Sansone at the Knowledge Dialogues/ODHK "Beyond Open"event
Susanna Sansone at the Knowledge Dialogues/ODHK "Beyond Open"eventSusanna Sansone at the Knowledge Dialogues/ODHK "Beyond Open"event
Susanna Sansone at the Knowledge Dialogues/ODHK "Beyond Open"event
 
Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a mult...
Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a mult...Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a mult...
Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a mult...
 

Recently uploaded

POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
Cherry
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cherry
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
Cherry
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
Cherry
 
PODOCARPUS...........................pptx
PODOCARPUS...........................pptxPODOCARPUS...........................pptx
PODOCARPUS...........................pptx
Cherry
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 

Recently uploaded (20)

Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
 
Plasmid: types, structure and functions.
Plasmid: types, structure and functions.Plasmid: types, structure and functions.
Plasmid: types, structure and functions.
 
Kanchipuram Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Kanchipuram Escorts 🥰 8617370543 Call Girls Offer VIP Hot GirlsKanchipuram Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Kanchipuram Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
 
FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
 
PODOCARPUS...........................pptx
PODOCARPUS...........................pptxPODOCARPUS...........................pptx
PODOCARPUS...........................pptx
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Genome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptxGenome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptx
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 

IDW2022: A decades experiences in transparent and interactive publication of FAIR data and software via an end-to-end XML publishing platform

  • 1. A decades experiences in transparent and interactive publication of FAIR data and software via an end-to-end XML publishing platform Scott Edmunds 0000-0001-6444-1436
  • 3. GigaSolution: rewarding open data & code http://gigasciencejournal.com/ Publishes “Data Notes” for CC0 data, “Tech Notes” for OSI software. Transparent: Open Peer Review and linked to preprints. Mandates code in repo.
  • 4. Integrated GigaDB repository: DataCite DOIs, no size limits, code snapshots, APC covers curation http://gigadb.org/ GigaSolution: rewarding open data & code
  • 5. 0 1 2 4 2 5 6 6 8 2 0 0 0 0 0 0 3 2 1 0 0 0 0 0 0 0 1 1 2 1 0 0 0 0 0 0 0 5 0 2 2 1 2 8 7 28 35 34 48 45 0 10 20 30 40 50 60 70 GigaScience software/workflow papers (Technical Notes), 2012-2021 Galaxy Snakemake Nextflow CWL Other Changes in how research is shared: workflows
  • 7. https://doi.org/10.1186/2047-217X-3-23 • Downloadable as virtual hard-disk/available as Amazon Machine Image • Unclear how to describe licensing & security issues? Experience publishing VMs: 2014
  • 8. https://doi.org/10.1186/s13742-015-0087-0 https://doi.org/10.1186/s13742-015-0073-6 • From 2015 increasing submissions leveraging containers • Promoted experiments in standardization such as bioboxes • Integrated with CodeOcean & tested with Gigantum • Carried out reproducibility case-studies (can be expensive) Experience publishing containers: 2015
  • 9. Independent execution of computations underlying research articles. Experience publishing CODECHECK: 2020 CODECHECK tackles one of the main challenges of computational research by supporting codecheckers with a workflow, guidelines and tools to evaluate computer programs underlying scientific papers. The independent time-stamped runs conducted by codecheckers will award a “certificate of executable computation” and increase availability, discovery and reproducibility of crucial artefacts for computational sciences. https://codecheck.org.uk/
  • 10. Experience publishing CODECHECK: 2020 http://gigasciencejournal.com/blog/codecheck-certificate/ https://doi.org/10.1093/gigascience/giaa026
  • 11. Experience publishing CODECHECK: 2020 https://www.nature.com/articles/d41586-020-01685-y http://doi.org/10.5281/zenodo.3865491
  • 12. Tech really the bottleneck Process much too slow & expensive Still too focused on narrative and static “version of record” Still not very FAIR Lessons learned in a decade of data & software publishing:
  • 13. D ATA C O D E E N T I T I E S FA C T S S TA B I L I T Y A new approach Follow the Software Paradigm? C O D E R E L E A S E F O R K U P D AT E R E P E AT Deconstruct the “Version of Record”?
  • 14. Move to new XML end-to-end pipeline Custom end-to-end workflow makes integrations simpler with one integration point
  • 15. Features of new journal: Main advantage of workflow is XML from start to end https://gigabytejournal.com/ Several modules acting as one platform: no import/export of files, so fast and accurate Cutting out production allows huge time & cost saving (currently as little as 3.5hrs per paper) Any number of versions can be published instantly, including typographic quality PDF or updates/forks Allows instantaneous switch of views Leverage embeddable dynamic content/widgets Initial focus on forkable open source products: data + software + update papers
  • 16. Focusing beyond VoR allows different views… 16 What does focusing on Data + software + XML allow us to do? https://doi.org/10.46471/gigabyte.1
  • 17. https://doi.org/10.46471/gigabyte.6 High quality rich XML CC-BY open licensed, open citations, open corpus Structured schema.org metadata No hiding of material in supplemental files Maximise use of persistent identifiers (PIDs) Who ORCID IDs CASRAI contributorship Funder (Fundref) Institution (ROR) What Species (NCBI, fishbase) Cell/strain (RRID) How Equipment (RRID) Software (RRID, bio.tools) Output Data (accessions, DOIs) Results (DOIs) Helping to make research “AI-ready” Thinking about users: machines
  • 18. Interaction: increasing understanding & trust https://doi.org/10.46471/gigabyte.13 Do you trust an immunoinformatics tool to predict whether memory T cells generated from previous exposure to common cold coronaviruses are cross-reactive against SARSCoV2?
  • 19. Interaction: software and code via Stencila and CodeOcean http://gigasciencejournal.com/blog/gigabyte-executable-research-articles/ Code Ocean “Compute Capsule”: readers can directly interact with software via an embedded version in the article; or deploy and run in their own cloud computing environment. Popout Stencila “Executable Research Article” where figures are accompanied by editable code blocks that can be edited and re- executed to immediately see the changes.
  • 20. Interact with Stenci.la “code chunks” & Code Ocean “compute capsules” of COVID-19 immunoinformatics papers https://doi.org/10.46471/gigabyte.13
  • 21. A new way of publishing FAIR research with new tech • Share & get credit for updatable data & software papers • Follow the software paradigm, bring your research to life • XML makes it much easier to embed interactive content • Use automation & interaction to increase scrutiny & trust • XML only workflow cuts time and cost to publish • Rethink “Version of Record”: focus on facts/data/code & discard the packaging Help us change scientific publishing, contact: editorial@gigabytejournal.com https://gigabytejournal.com/
  • 22. Thanks to: @GigaByteJournal facebook.com/GigaScience http://gigasciencejournal.com/blog/ Follow us: + Weibo & WeChat Laurie Goodman, Publisher Nicole Nogoy, Editor Hans Zauner, Assistant Editor Hongling Zhao, Assistant Editor Peter Li, Head of IT Chris Hunter, Lead BioCurator Chris Armit, Data Scientist Mary Ann Tulli, Data Editor Rija Ménagé, Senior Software Engineer Ken Cho, Systems Programmer Analyst Chen Qi, Shenzhen Office. https://gigabytejournal.com/ editorial@gigabytejournal.com