SlideShare a Scribd company logo
1 of 29
Democratising Data Publishing: A Global
Perspective
Dr Chris Armit
Data Scientist, GigaScience
BGI-Hong Kong
Need for FAIR (high quality) Open Data
Enables
• Using networking power of the internet to tackle problems
• Can ask new questions & find hidden patterns & connections
• Build on each others efforts quicker & more efficiently
• More collaborations across more disciplines
• Harness wisdom of the crowds: crowdsourcing, citizen science,
crowdfunding
Global Challenges
• Quick response to climate change, food security & disease outbreaks
• Cultural & technical hurdles need to be overcome
Global Challenges
https://www.nature.com/articles/d41586-018-07244-w
• How will Open Access (APC) models work here?
• Authors are unlikely to be able to afford article processing charges
http://www.nature.com/news/data-sharing-make-outbreak-research-open-access-1.16966
Cultural Hurdles in Publishing Research Data
Example: Disease outbreaks
• Genome sequences from the West Africa outbreak of Ebola were first made
publicly available in April 2014
• Datasets were released sporadically when this became a hot research topic
• This led to gaps in the data
Democratising Data at GigaScience
• GigaScience integrates and publishes all research objects to
maximise reproducibility, transparency and reuse
• GigaDB enables rapid publication of data associated with a
GigaScience manuscript
• GigaDB DOIs incentivise early release of data/code/etc.
• Data
• Software
• Models
• Pipelines
• Reviews
• E. Coli O104:H4 isolate TY-2482 in
Germany, >50 died, June 2011
• Crisis, mass panic, data needed
• BGI working with Hamburg University
let us share the data CC0 with our
first data DOI from GigaDB.
• Released via twitter
• Did not know consequences of early release of data
• These data were considered of such great importance that we did not wish
to wait for publication
Example: Disease outbreaks
http://dx.doi.org/10.5524/100001
Democratising Data at GigaScience
Democratising Data at GigaScience
• From Big Data to usable Data
• Example: WebTools for easy browsing and visualisation
• Pan-and-zoom map browser as a visual aid to allow the end user to
find datasets
• Pan-and-zoom map browser as a visual aid to allow the end user to
find datasets
Democratising Data at GigaScience
• From Big Data to usable Data
• Example: WebTools for easy browsing and visualisation
• 3D viewer allows users to interact and explore image data prior to data
download
• 3D models are CC0, can be downloaded, and are printable
Democratising Data at GigaScience
• From Big Data to usable Data
• Example: WebTools for easy browsing and visualisation
Democratising Data at GigaScience
• Widening the target audience
• Bioinformaticians and ‘Big Data’ scientists are a
primary target audience
• Plugins and visualisations make access easier for the
less technically inclined
• Democratises access through education potential
and ease of use
Democratising Data at GigaScience
Difficulties we have encountered…
• Internet, i.e. Bandwidth, unstable connections,
occasionally US institutions blocking Chinese IP
addresses, China blocking google/dropbox links
• Copying 10GB of data from South Africa took >1month
because of powercuts
• Email communication difficulties due to spam filters.
• Data access agreements (clinical data)
Democratising Data at GigaScience
• Example: Food security
• Rice, Oryza sativa L., is the
staple food for half the
world’s population
• By 2030, rice production
must increase by at least
25% to keep pace with
population growth
Democratising Data at GigaScience
Rice 3K project
• 3,000 rice genomes
• 13.4TB public data
• 6 months to copy
data to Sequence
Read Archive (SRA)
• Data published 4
years before
analysis published
From Big Data to usable(ish) Data
• Although 13TB data in GigaDB was open (CC0), after analysing in
Tianhe supercomputer processed rice3K data = 100TB
• AWS hosted for free, but expensive to process
https://aws.amazon.com/public-data-sets/3000-rice-genome/
Processed data finally published 1st May 2018, Nature v557, p43–49
https://www.nature.com/articles/s41586-018-0063-9
Democratising Data at GigaScience
• Example: Food security
• The African Orphan Crop
Consortium (AOCC) is
developing genomic
resources for 101 crops that
represent a significant part
of African/Asian diets.
• To-date, the AOCC working
on 69 genomes, 5 of which
are published in GigaDB.
Hyacinth bean
• Stunting: Physical, Neurological, Economic
Growing Africa Out of Stunting, Hunger & Malnutrition:
The African Orphan Crops Consortium
• Provide genomic tools to accelerate breeding in 101 crops
important to African Diets
• Define genetic diversity in 100 lines/species
• Train 150 top African plant breeders to use the latest strategies
and technologies in plant breeding
African Orphan Crops Consortium (AOCC)
Courtesy: AOCC
African Orphan Crops Consortium (AOCC)
23
Democratising Data at GigaScience
• Each AOCC genome is a single GigaDB dataset (with DOI)
Democratising Data at GigaScience
• From Big Data to usable Data
• Example: Easy-to-use plug and play RiceGalaxy
• Processed data and software tools made freely available
• GUI means plant breeders can utilise genetic data without coding skills
• Funded to run at low cost (<100 USD/month) via AWS Singapore & local
servers (2 vCPUs, 8GB RAM, 2 mounted volumes, 200GB total storage)
• CGIAR Excellence in Plant Breeding Platform/model will roll out to other
crops
Democratising Data at GigaScience
• From Big Data to usable Data
• Example: Easy-to-use plug and play RiceGalaxy
• GUI means plant breeders can utilise genetic data without coding skills
• Funded to run at low cost (<100 USD/month) via AWS Singapore & local
servers (2 vCPUs, 8GB RAM, 2 mounted volumes, 200GB total storage)
• CGIAR Excellence in Plant Breeding Platform/model will roll out to other
crops
Courtesy: IRRI
Courtesy: AOCC
Acknowledgements
Laurie Goodman, Editor in Chief
Scott Edmunds, Executive Editor
Chris Hunter, GigaDB Lead BioCurator
Mary Ann Tuli, GigaDB Data Editor
Xiao (Jesse) Si Zhe, Database Developer
Nicole Nogoy, Editor
Hans Zauner, Assistant Editor
Hongling Zhao, Assistant Editor
Peter Li, Lead Data Manager
Chen Qi, Shenzhen Office.
@GigaScience
facebook.com/GigaScience
http://gigasciencejournal.com/blog/
www.gigasciencejournal.com
www.gigadb.org
+
Weibo
& WeChat

More Related Content

What's hot

BD2K and the Commons : ELIXR All Hands
BD2K and the Commons : ELIXR All Hands BD2K and the Commons : ELIXR All Hands
BD2K and the Commons : ELIXR All Hands Vivien Bonazzi
 
Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Dan Taylor
 
The NIH Data Commons - BD2K All Hands Meeting 2015
The NIH Data Commons -  BD2K All Hands Meeting 2015The NIH Data Commons -  BD2K All Hands Meeting 2015
The NIH Data Commons - BD2K All Hands Meeting 2015Vivien Bonazzi
 
Prototype SDX Bioinformatics Exchange: Demonstrating an Essential Use-Case fo...
Prototype SDX Bioinformatics Exchange: Demonstrating an Essential Use-Case fo...Prototype SDX Bioinformatics Exchange: Demonstrating an Essential Use-Case fo...
Prototype SDX Bioinformatics Exchange: Demonstrating an Essential Use-Case fo...US-Ignite
 
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...DeVonne Parks, CEM
 
Big data and health care
 Big data and health care Big data and health care
Big data and health carecjw119
 
Big data and health care
 Big data and health care Big data and health care
Big data and health carecjw119
 
NIH Data Commons - Note: Presentation has animations
NIH Data Commons  - Note:  Presentation has animations NIH Data Commons  - Note:  Presentation has animations
NIH Data Commons - Note: Presentation has animations Vivien Bonazzi
 
Big Data Content Organization, Discovery, and Management
Big Data Content Organization, Discovery, and ManagementBig Data Content Organization, Discovery, and Management
Big Data Content Organization, Discovery, and ManagementAccess Innovations, Inc.
 
SGCI and Globus: Partners for Acceleration of Science
SGCI and Globus: Partners for Acceleration of ScienceSGCI and Globus: Partners for Acceleration of Science
SGCI and Globus: Partners for Acceleration of ScienceGlobus
 
NIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsVivien Bonazzi
 
Data.gov Open Data Day
Data.gov Open Data DayData.gov Open Data Day
Data.gov Open Data DayJeanne Holm
 
Big Data R&D Strategy - Ensure the long term sustainability, access, and deve...
Big Data R&D Strategy - Ensure the long term sustainability, access, and deve...Big Data R&D Strategy - Ensure the long term sustainability, access, and deve...
Big Data R&D Strategy - Ensure the long term sustainability, access, and deve...Sky Bristol
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...GigaScience, BGI Hong Kong
 
re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.Shakir Ali
 
Data Sharing via Globus in the NIH Intramural Program
Data Sharing via Globus in the NIH Intramural ProgramData Sharing via Globus in the NIH Intramural Program
Data Sharing via Globus in the NIH Intramural ProgramGlobus
 
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la IglesiaBIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la IglesiaMaria de la Iglesia
 
Data Commons Garvan - 2016
Data Commons Garvan -  2016 Data Commons Garvan -  2016
Data Commons Garvan - 2016 Vivien Bonazzi
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteGigaScience, BGI Hong Kong
 

What's hot (20)

BD2K and the Commons : ELIXR All Hands
BD2K and the Commons : ELIXR All Hands BD2K and the Commons : ELIXR All Hands
BD2K and the Commons : ELIXR All Hands
 
Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2
 
The NIH Data Commons - BD2K All Hands Meeting 2015
The NIH Data Commons -  BD2K All Hands Meeting 2015The NIH Data Commons -  BD2K All Hands Meeting 2015
The NIH Data Commons - BD2K All Hands Meeting 2015
 
Prototype SDX Bioinformatics Exchange: Demonstrating an Essential Use-Case fo...
Prototype SDX Bioinformatics Exchange: Demonstrating an Essential Use-Case fo...Prototype SDX Bioinformatics Exchange: Demonstrating an Essential Use-Case fo...
Prototype SDX Bioinformatics Exchange: Demonstrating an Essential Use-Case fo...
 
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
 
Big data and health care
 Big data and health care Big data and health care
Big data and health care
 
Big data and health care
 Big data and health care Big data and health care
Big data and health care
 
NIH Data Commons - Note: Presentation has animations
NIH Data Commons  - Note:  Presentation has animations NIH Data Commons  - Note:  Presentation has animations
NIH Data Commons - Note: Presentation has animations
 
Big Data Content Organization, Discovery, and Management
Big Data Content Organization, Discovery, and ManagementBig Data Content Organization, Discovery, and Management
Big Data Content Organization, Discovery, and Management
 
Datamining with big data
 Datamining with big data  Datamining with big data
Datamining with big data
 
SGCI and Globus: Partners for Acceleration of Science
SGCI and Globus: Partners for Acceleration of ScienceSGCI and Globus: Partners for Acceleration of Science
SGCI and Globus: Partners for Acceleration of Science
 
NIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data Commons
 
Data.gov Open Data Day
Data.gov Open Data DayData.gov Open Data Day
Data.gov Open Data Day
 
Big Data R&D Strategy - Ensure the long term sustainability, access, and deve...
Big Data R&D Strategy - Ensure the long term sustainability, access, and deve...Big Data R&D Strategy - Ensure the long term sustainability, access, and deve...
Big Data R&D Strategy - Ensure the long term sustainability, access, and deve...
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
 
re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.
 
Data Sharing via Globus in the NIH Intramural Program
Data Sharing via Globus in the NIH Intramural ProgramData Sharing via Globus in the NIH Intramural Program
Data Sharing via Globus in the NIH Intramural Program
 
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la IglesiaBIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
 
Data Commons Garvan - 2016
Data Commons Garvan -  2016 Data Commons Garvan -  2016
Data Commons Garvan - 2016
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByte
 

Similar to Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective

Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applicationsPadma Metta
 
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...CINECAProject
 
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...Sirris
 
Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its ChallengesKathirvel Ayyaswamy
 
Enabling the physical world to the Internet and potential benefits for agricu...
Enabling the physical world to the Internet and potential benefits for agricu...Enabling the physical world to the Internet and potential benefits for agricu...
Enabling the physical world to the Internet and potential benefits for agricu...Andreas Kamilaris
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?Robert Grossman
 
Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven RamageGeospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven RamageSteven Ramage
 
Puja(801),sanghamitra(819),surabhi(844)
Puja(801),sanghamitra(819),surabhi(844)Puja(801),sanghamitra(819),surabhi(844)
Puja(801),sanghamitra(819),surabhi(844)puja singh
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreSoftweb Solutions
 
Graham Pryor
Graham PryorGraham Pryor
Graham PryorEduserv
 
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...GigaScience, BGI Hong Kong
 
ICRISAT Global Planning Meeting 2019: Research Data Management by Abhishek Ra...
ICRISAT Global Planning Meeting 2019: Research Data Management by Abhishek Ra...ICRISAT Global Planning Meeting 2019: Research Data Management by Abhishek Ra...
ICRISAT Global Planning Meeting 2019: Research Data Management by Abhishek Ra...ICRISAT
 
Managing and sharing data: lessons from the European context
Managing and sharing data: lessons from the European contextManaging and sharing data: lessons from the European context
Managing and sharing data: lessons from the European contextSarah Jones
 
Big and Small Web Data
Big and Small Web DataBig and Small Web Data
Big and Small Web DataMarieke Guy
 
What Open Data and Open Source can do for Sri Lanka?
What Open Data and Open Source can do for Sri Lanka?What Open Data and Open Source can do for Sri Lanka?
What Open Data and Open Source can do for Sri Lanka?Srinath Perera
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forumChris Dwan
 
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
(Em)Powering Science: High-Performance Infrastructure in Biomedical ScienceAri Berman
 
Ticer summer school_24_aug06
Ticer summer school_24_aug06Ticer summer school_24_aug06
Ticer summer school_24_aug06SayDotCom.com
 

Similar to Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective (20)

Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applications
 
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
 
Big Data
Big Data Big Data
Big Data
 
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...
 
Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its Challenges
 
Enabling the physical world to the Internet and potential benefits for agricu...
Enabling the physical world to the Internet and potential benefits for agricu...Enabling the physical world to the Internet and potential benefits for agricu...
Enabling the physical world to the Internet and potential benefits for agricu...
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?
 
Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven RamageGeospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage
 
Puja(801),sanghamitra(819),surabhi(844)
Puja(801),sanghamitra(819),surabhi(844)Puja(801),sanghamitra(819),surabhi(844)
Puja(801),sanghamitra(819),surabhi(844)
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and more
 
Graham Pryor
Graham PryorGraham Pryor
Graham Pryor
 
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
 
ICRISAT Global Planning Meeting 2019: Research Data Management by Abhishek Ra...
ICRISAT Global Planning Meeting 2019: Research Data Management by Abhishek Ra...ICRISAT Global Planning Meeting 2019: Research Data Management by Abhishek Ra...
ICRISAT Global Planning Meeting 2019: Research Data Management by Abhishek Ra...
 
Managing and sharing data: lessons from the European context
Managing and sharing data: lessons from the European contextManaging and sharing data: lessons from the European context
Managing and sharing data: lessons from the European context
 
Big and Small Web Data
Big and Small Web DataBig and Small Web Data
Big and Small Web Data
 
What Open Data and Open Source can do for Sri Lanka?
What Open Data and Open Source can do for Sri Lanka?What Open Data and Open Source can do for Sri Lanka?
What Open Data and Open Source can do for Sri Lanka?
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forum
 
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
 
Ticer summer school_24_aug06
Ticer summer school_24_aug06Ticer summer school_24_aug06
Ticer summer school_24_aug06
 
Big data ankita1
Big data ankita1Big data ankita1
Big data ankita1
 

More from GigaScience, BGI Hong Kong

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...GigaScience, BGI Hong Kong
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...GigaScience, BGI Hong Kong
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...GigaScience, BGI Hong Kong
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...GigaScience, BGI Hong Kong
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...GigaScience, BGI Hong Kong
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...GigaScience, BGI Hong Kong
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...GigaScience, BGI Hong Kong
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixGigaScience, BGI Hong Kong
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserGigaScience, BGI Hong Kong
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...GigaScience, BGI Hong Kong
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceGigaScience, BGI Hong Kong
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...GigaScience, BGI Hong Kong
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...GigaScience, BGI Hong Kong
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...GigaScience, BGI Hong Kong
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...GigaScience, BGI Hong Kong
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...GigaScience, BGI Hong Kong
 
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...GigaScience, BGI Hong Kong
 
Susanna Sansone at the Knowledge Dialogues/ODHK "Beyond Open"event
Susanna Sansone at the Knowledge Dialogues/ODHK "Beyond Open"eventSusanna Sansone at the Knowledge Dialogues/ODHK "Beyond Open"event
Susanna Sansone at the Knowledge Dialogues/ODHK "Beyond Open"eventGigaScience, BGI Hong Kong
 
Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a mult...
Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a mult...Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a mult...
Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a mult...GigaScience, BGI Hong Kong
 

More from GigaScience, BGI Hong Kong (20)

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...
 
Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
 
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...
 
Susanna Sansone at the Knowledge Dialogues/ODHK "Beyond Open"event
Susanna Sansone at the Knowledge Dialogues/ODHK "Beyond Open"eventSusanna Sansone at the Knowledge Dialogues/ODHK "Beyond Open"event
Susanna Sansone at the Knowledge Dialogues/ODHK "Beyond Open"event
 
Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a mult...
Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a mult...Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a mult...
Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a mult...
 

Recently uploaded

LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.Silpa
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIADr. TATHAGAT KHOBRAGADE
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLkantirani197
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Silpa
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptxryanrooker
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Silpa
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxMohamedFarag457087
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxSilpa
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry Areesha Ahmad
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Silpa
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxDiariAli
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxANSARKHAN96
 

Recently uploaded (20)

LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
 

Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective

  • 1. Democratising Data Publishing: A Global Perspective Dr Chris Armit Data Scientist, GigaScience BGI-Hong Kong
  • 2. Need for FAIR (high quality) Open Data Enables • Using networking power of the internet to tackle problems • Can ask new questions & find hidden patterns & connections • Build on each others efforts quicker & more efficiently • More collaborations across more disciplines • Harness wisdom of the crowds: crowdsourcing, citizen science, crowdfunding Global Challenges • Quick response to climate change, food security & disease outbreaks • Cultural & technical hurdles need to be overcome
  • 3. Global Challenges https://www.nature.com/articles/d41586-018-07244-w • How will Open Access (APC) models work here? • Authors are unlikely to be able to afford article processing charges
  • 4. http://www.nature.com/news/data-sharing-make-outbreak-research-open-access-1.16966 Cultural Hurdles in Publishing Research Data Example: Disease outbreaks • Genome sequences from the West Africa outbreak of Ebola were first made publicly available in April 2014 • Datasets were released sporadically when this became a hot research topic • This led to gaps in the data
  • 5. Democratising Data at GigaScience • GigaScience integrates and publishes all research objects to maximise reproducibility, transparency and reuse • GigaDB enables rapid publication of data associated with a GigaScience manuscript • GigaDB DOIs incentivise early release of data/code/etc. • Data • Software • Models • Pipelines • Reviews
  • 6. • E. Coli O104:H4 isolate TY-2482 in Germany, >50 died, June 2011 • Crisis, mass panic, data needed • BGI working with Hamburg University let us share the data CC0 with our first data DOI from GigaDB. • Released via twitter • Did not know consequences of early release of data • These data were considered of such great importance that we did not wish to wait for publication Example: Disease outbreaks http://dx.doi.org/10.5524/100001 Democratising Data at GigaScience
  • 7.
  • 8.
  • 9.
  • 10.
  • 11. Democratising Data at GigaScience • From Big Data to usable Data • Example: WebTools for easy browsing and visualisation • Pan-and-zoom map browser as a visual aid to allow the end user to find datasets
  • 12. • Pan-and-zoom map browser as a visual aid to allow the end user to find datasets Democratising Data at GigaScience • From Big Data to usable Data • Example: WebTools for easy browsing and visualisation
  • 13. • 3D viewer allows users to interact and explore image data prior to data download • 3D models are CC0, can be downloaded, and are printable Democratising Data at GigaScience • From Big Data to usable Data • Example: WebTools for easy browsing and visualisation
  • 14. Democratising Data at GigaScience • Widening the target audience • Bioinformaticians and ‘Big Data’ scientists are a primary target audience • Plugins and visualisations make access easier for the less technically inclined • Democratises access through education potential and ease of use
  • 15. Democratising Data at GigaScience Difficulties we have encountered… • Internet, i.e. Bandwidth, unstable connections, occasionally US institutions blocking Chinese IP addresses, China blocking google/dropbox links • Copying 10GB of data from South Africa took >1month because of powercuts • Email communication difficulties due to spam filters. • Data access agreements (clinical data)
  • 16. Democratising Data at GigaScience • Example: Food security • Rice, Oryza sativa L., is the staple food for half the world’s population • By 2030, rice production must increase by at least 25% to keep pace with population growth
  • 17. Democratising Data at GigaScience Rice 3K project • 3,000 rice genomes • 13.4TB public data • 6 months to copy data to Sequence Read Archive (SRA) • Data published 4 years before analysis published
  • 18. From Big Data to usable(ish) Data • Although 13TB data in GigaDB was open (CC0), after analysing in Tianhe supercomputer processed rice3K data = 100TB • AWS hosted for free, but expensive to process https://aws.amazon.com/public-data-sets/3000-rice-genome/
  • 19. Processed data finally published 1st May 2018, Nature v557, p43–49 https://www.nature.com/articles/s41586-018-0063-9
  • 20. Democratising Data at GigaScience • Example: Food security • The African Orphan Crop Consortium (AOCC) is developing genomic resources for 101 crops that represent a significant part of African/Asian diets. • To-date, the AOCC working on 69 genomes, 5 of which are published in GigaDB. Hyacinth bean
  • 21. • Stunting: Physical, Neurological, Economic Growing Africa Out of Stunting, Hunger & Malnutrition: The African Orphan Crops Consortium
  • 22. • Provide genomic tools to accelerate breeding in 101 crops important to African Diets • Define genetic diversity in 100 lines/species • Train 150 top African plant breeders to use the latest strategies and technologies in plant breeding African Orphan Crops Consortium (AOCC) Courtesy: AOCC
  • 23. African Orphan Crops Consortium (AOCC) 23
  • 24. Democratising Data at GigaScience • Each AOCC genome is a single GigaDB dataset (with DOI)
  • 25. Democratising Data at GigaScience • From Big Data to usable Data • Example: Easy-to-use plug and play RiceGalaxy • Processed data and software tools made freely available • GUI means plant breeders can utilise genetic data without coding skills • Funded to run at low cost (<100 USD/month) via AWS Singapore & local servers (2 vCPUs, 8GB RAM, 2 mounted volumes, 200GB total storage) • CGIAR Excellence in Plant Breeding Platform/model will roll out to other crops
  • 26. Democratising Data at GigaScience • From Big Data to usable Data • Example: Easy-to-use plug and play RiceGalaxy • GUI means plant breeders can utilise genetic data without coding skills • Funded to run at low cost (<100 USD/month) via AWS Singapore & local servers (2 vCPUs, 8GB RAM, 2 mounted volumes, 200GB total storage) • CGIAR Excellence in Plant Breeding Platform/model will roll out to other crops
  • 29. Acknowledgements Laurie Goodman, Editor in Chief Scott Edmunds, Executive Editor Chris Hunter, GigaDB Lead BioCurator Mary Ann Tuli, GigaDB Data Editor Xiao (Jesse) Si Zhe, Database Developer Nicole Nogoy, Editor Hans Zauner, Assistant Editor Hongling Zhao, Assistant Editor Peter Li, Lead Data Manager Chen Qi, Shenzhen Office. @GigaScience facebook.com/GigaScience http://gigasciencejournal.com/blog/ www.gigasciencejournal.com www.gigadb.org + Weibo & WeChat

Editor's Notes

  1. Up to $5000-6000 USDs
  2. Quadrupled data in the public domain. Data publication 4 years before analysis published in Nature
  3. Quadrupled data in the public domain. Data publication 4 years before analysis published in Nature