SlideShare una empresa de Scribd logo
1 de 53
Big Data publishing
Beyond dead trees, and a case study
Scott Edmunds
ISMB, 22nd
July 2013
@gigascience
The problems with publishing
• Scholarly articles are merely advertisement of scholarship .
The actual scholarly artefacts, i.e. the data and computational
methods, which support the scholarship, remain largely
inaccessible --- Jon B. Buckheit and David L. Donoho, WaveLab
and reproducible research, 1995
• Core scientific statements or assertions are intertwined and
hidden in the conventional scholarly narratives
• Lack of transparency, lack of credit for anything other than
“regular” dead tree publication
Time to move beyond:
18121665 1869
Problem: growing replication gap
1. Ioannidis et al., 2009. Repeatability of published microarray gene expression analyses. Nature Genetics 41: 14
2. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html
3. Bjorn Brembs: Open Access and the looming crisis in science https://theconversation.com/open-access-and-the-looming-crisis-in-science-14950
Out of 18 microarray papers, results
from 10 could not be reproduced
Out of 18 microarray papers, results
from 10 could not be reproduced
More retractions:
>15X increase in last decade
At current % > by 2045 as many papers published as
retracted
Motivation
• Scholarly artefacts must be
– Treated as first-class objects, in scientific investigations
and in scholarly communications
– Made machine-readable for the convenience of reasoning
– Represented in an interoperable manner
• Truly “add value” to publishing
– Provide infrastructure to aid & reward replication
• Trial of ISA+Nanopublication+RO
– Three similar approaches that should complement each
other for the representation of scholarly artifacts
Motivation
• Scholarly artefacts must be
– Treated as first-class objects, in scientific investigations
and in scholarly communications
?
:“data* generated in the
course of research are just as valuable
to the ongoing academic discourse as
papers and monographs”.
“increase acceptance of research
data* as legitimate, citable contributions
to the scholarly record”.
Data* Citation (*and more)
• Data
• Software
• Re-use…
= Credit}
GigaSolution: deconstructing the paper
Need to credit and reward:
•Data/software availability
•Metadata/curation
•Interoperability
•Availability of workflows
•Transparent analyses
Data
Metadata
Methods
Analyses
GigaSolution: deconstructing the paper
www.gigadb.org
www.gigasciencejournal.com
20PB storage, 20.5K cores, 212TFlops,
>1000 bioinformaticians
Utilizes big-data infrastructure and expertise from:
Combines and integrates:
Open-access journal
Data Publishing Platform
Data Analysis Platform
What should we reward?
Different levels of granularity:
Experiment
(e.g. Rice 10K project)
Datasets
(e.g. species, variety)
Sample
(e.g. specimen xyz)
e.g. doi:10.5524/100001
e.g. doi:10.5524/100001-2
e.g. doi:10.5524/100001-2000
or doi:10.5524/100001_xyz
Smaller still?
Papers
Data/
Micropubs
NanopubsFacts/Assertions (~1014
in literature)
Reward different shaped publishable objects
Rewarding open data
Validationchecks
Fail – submitter is
provided error report
Pass – dataset is
uploaded to
GigaDB.
Submission Workflow
Curator makes dataset public
(can be set as future date if
required)
DataCite
XML file
Excel
submission file
Submitter logs in to
GigaDB website and
uploads Excel
submission
GigaDB
DOI
assigned
Files
Submitter provides
files by ftp or
Aspera
XML is generated and
registered with DataCite
Curator
Review
Curator contacts submitter with
DOI citation and to arrange file
transfer (and resolve any other
questions/issues).
DOI 10.5524/100003
Genomic data from the
crab-eating
macaque/cynomolgus
monkey (Macaca
fascicularis) (2011)
Public GigaDB dataset
Reward open & transparent review
End reviewer 3 Download parody videos, now!
Real-time open-review = paper in arXiv + blogged reviews
Reward open & transparent review
http://tmblr.co/ZzXdssfOMJfywww.gigasciencejournal.com/content/2/1/10
Cloud
solutions?
Reward better handling of metadata…
Novel tools/formats for data interoperability/handling.
BMC Research Awards 2013
Winner of open data award
Rewarding transparent methods/results
Software availability
Open code
Accessible pipelines
Sharable workflows
Research Objects
Easeofreplication
Implement workflows in a community-accepted format
http://galaxyproject.org
Over 36,000 main
Galaxy server users
Over 500 papers
citing Galaxy use
Over 55 Galaxy
servers deployed
Open source
galaxy.cbiit.cuhk.edu.hk
Research Objects
An aggregation of scholarly artefacts:
• Data used or results produced in an
experiment study
• Methods employed to produce and
analyse that data
• Provenance and setting information
about the experiments
• People involved in the investigation
• Annotations about these resources, that
are essential to the understanding and
interpretation of the scientific outcomes
captured by a research object.
Example
How are we supporting data
reproducibility?
Data sets
Analyses
Linked to
Linked to
DOI
DOI
Open-Paper
Open-Review
DOI:10.1186/2047-217X-1-18
>11000 accesses
Open-Code
8 reviewers tested data in ftp server & named reports published
DOI:10.5524/100044
Open-Pipelines
Open-Workflows
DOI:10.5524/100038
Open-Data
78GB CC0 data
Code in sourceforge under GPLv3:
http://soapdenovo2.sourceforge.net/>5000 downloads
Enabled code to being picked apart by bloggers in wiki
http://homolog.us/wiki/index.php?title=SOAPdenovo2
8 referees downloaded & tested data, then signed reports
Reward open & transparent review
Post publication: bloggers pull apart code/reviews in blogs + wiki:
SOAPdenov2 wiki: http://homolog.us/wiki1/index.php?title=SOAPdenovo2
Homologus blogs: http://www.homolog.us/blogs/category/soapdenovo/
Reward open & transparent review
SOAPdenovo2 workflows implemented in
galaxy.cbiit.cuhk.edu.hk
SOAPdenovo2 workflows implemented in
galaxy.cbiit.cuhk.edu.hk
Implemented entire workflow in our Galaxy server, inc.:
• 3 pre-processing steps
• 4 SOAPdenovo modules
• 1 post processing steps
• Evaluation and visualization tools
Also will be available to download by >36K Galaxy users in
How much further can we take this?
How much further can we take this?
ISA + RO + Nanopub case study
Understand how each of the three models can support representation of
the actual scholarly artefacts, which are essential first-class objects in
scholarly communication
Demonstrate added value to life science, publishing and scholarly
communication communities on how these models should be used
together to describe scholarly artefacts from life sciences domains
Data models
(instructions for authors for digital publishing)
• Research Object
– An encapsulation of essential information related to experiments
and investigations
• The ISA (Investigation + Study + Assay) framework
– includes a format and a set of software tools that enable its
international user community to provide rich description of
the experimental workflows in life science, environmental
and biomedical domains.
• Nanopublication
– Dissemination of individual data (assertions) with/without an
accompanying scholarly articles
– Enables attribution to the scientists for sharing these their data
DataData
Method/Experi
mental protocol
Method/Experi
mental protocol
FindingsFindings
Types of resources in an RO
Wfdesc/ISA-
TAB/ISA2OWL
Wfdesc/ISA-
TAB/ISA2OWL
Models to describe each resource type
The Big Picture
The SOAPdenovo2 Case study
The Data
The method, as
a Galaxy
workflow
The findings, Table2 in the paper
(doi:10.1186/2047-217X-1-18)
investigation
Investigation/Study/Assay infrastructure
The investigation file is a high-level aggregator for related studies, contains all the
information to understand the overall goals of an experiment, including investigators
involved, associated publications, the experimental design, experimental factors,
protocols, funding agencies and so on…
investigation
Investigation/Study/Assay infrastructure
An investigation can have one or more studies. A study is the central unit of the
experimental description and it contains information on the subject(s) under study,
their characteristics, and any treatments applied.
study study
investigation
Investigation/Study/Assay infrastructure
Each study has one or more associated assays. The assay is the test performed
either on the subject or on material taken from the subject, which produce qualitative
and/or quantitative measurements.
study study
assay assay assay assay
investigation
Investigation/Study/Assay infrastructure
study study
assay assay assay assay
data analysis method scriptdata data
ISA
investigation
Study Design
ISA framework
SOAPdenovo2
investigation
study
Study Samples: A table rendering the sample collection
workflow
ISA framework
SOAPdenovo2
investigation
study
assay
ftp://public.genomics.org.cn/BGI/SOAPdenovo2
http://galaxy.cbiit.cuhk.edu.hk/
FTP
ISA framework
SOAPdenovo2
investigation
study
assay
Representation available in:
•Tabular format
• Spreadsheet-like format
• For biologists/experimentalists
•RDF/OWL format for Semantic Web/Linked Data users
• For bioinformaticians/software developers
• Facilitating data integration, querying, reasoning
•Support for submission to public repositories and data
publication platforms
•Tools support for curation, creation, storage, analysis…
•Large and diverse life science user/collaborator communities
ISA framework
RO + ISA
Scientific Workflow-specific ROs ISA experiment and data description
Scientific,
computational
Experiments,
non-wet lab
protocols
Scientific,
computational
Experiments,
non-wet lab
protocols
Focus on web-lab
or non-
computational
experimental
protocols
Focus on web-lab
or non-
computational
experimental
protocols
An RO for the Case Study
A Galaxy
workflow
A Galaxy
workflow
Some nanopub
statements
Some nanopub
statements
Input
sequence
data
Input
sequence
data
A Research Object
The Research Object contains the
following artefacts:
• The inputs sequence data that
are represented in ISA-TAB format
• The Galaxy workflow that reflects
the computational steps taken for
generating the results used to
produce Table 2
• Machine-readable descriptions
about the workflow
• The nanopublication statements
that represents claims based on
the content of Table 2
Descriptions about
the workflow
Descriptions about
the workflow
An RO for the Case Study
Assertion
Nanopublication URL
Provenance PublicationInfo
assertio
n
assertio
n
opm:
was
Derived
From
opm:
was
Derived
From
opm:
wasGene-
ratedBy
opm:
wasGene-
ratedBy
this
nanopub
this
nanopub
dcterms:
created
dcterms:
created
pav:
authored-
By
pav:
authored-
By
associa-
tion
associa-
tion aa
sio:statis-
ticalAssoci
ation
sio:statis-
ticalAssoci
ation
sio:has-
measurem
entValue
sio:has-
measurem
entValue
Associatio
n_1_p_val
ue
Associatio
n_1_p_val
ue
aa
Sio:probab
ility-value
Sio:probab
ility-value
sio:has-
value
sio:has-
value
6.56 e-5
^^xsd:floa
t
6.56 e-5
^^xsd:floa
t
sio:
refers-to
sio:
refers-to
dcterms:
DOI
dcterms:
DOI
…
Integrity KeyIntegrity Key
An Individual association
between concepts:
•statement or declaration
•measurement
•hypothetical inference
•quantitative or qualitative
An Individual association
between concepts:
•statement or declaration
•measurement
•hypothetical inference
•quantitative or qualitative
Guarantee immutability
after publication
Guarantee immutability
after publication
Unique, persistent and
resolvable identifier
Unique, persistent and
resolvable identifier
How this assertion came
to be, methods,
evidence, context, etc.
How this assertion came
to be, methods,
evidence, context, etc.
• Detailed attribution
for authors,
institutions, lab
technicians, curators
• License info
• Publication date
• Detailed attribution
for authors,
institutions, lab
technicians, curators
• License info
• Publication date
A Nanopublication-Centric View
• Improvements of SOAPdenovo2 have also been observed in assembling GAGE [8]
dataset (see Additional file 1: Supplementary Method 6 and Tables 2 and 3). As 
shown in Tables 2 and 3, the correct assembly length of SOAPdenovo2  increased
by approximately 3 to 80-fold comparing with that of SOAPdenovo1.
SOAPdenovo2 S. aureus pipeline
How do we generate a nanopub from this?
…stay tuned for Tech Track talk #34 by Marco Roos
ICC Lounge 81, Tuesday 23rd
: 3.40pm-4.05pm
Final step: visualizationFinal step: visualization
NC_010079.pdf
gi_161510924_ref_NC_010063.1_.pdf
CONTIGuator 2 (thanks Marco Galardini)CONTIGuator 2 (thanks Marco Galardini)
https://github.com/combogenomics/CONTIGuator
Lessons learned:
• Is possible to push button(s) & recreate a result from a paper
• Reproducibility is COSTLY. How much are you
willing to spend?
• Learn a huge amount about the study, and provides lots of
information not present in the paper
• Much easier to do this before rather than after publication
steps
• Complete the case study on the release of ISA-OWL,
nanopubs & ROs
• Extend the case study by including more than one datasets or
ROs, in order to show how related or conflicting information
can be more easily interlinked
• Create community guidelines on how these three models
should be used together, e.g. recommended patterns or
vocabulary terms
www.gigasciencejournal.com
Give us your data &
pipelines!*
Want to go beyond
dead trees & the PDF?
scott@gigasciencejournal.com
editorial@gigasciencejournal.com
database@gigasciencejournal.com
Contact us:
* APC’s currently generously covered by
BGI in 2013
Ruibang Luo (BGI/HKU)
Shaoguang Liang (BGI-SZ)
Tin-Lap Lee (CUHK)
Qiong Luo (HKUST)
Senghong Wang (HKUST)
Yan Zhou (HKUST)
Thanks to:
@gigascience
facebook.com/GigaScience
blogs.openaccesscentral.com/blogs/gigablog/
Peter Li
Huayan Gao
Chris Hunter
Jesse Si Zhe
Nicole Nogoy
Laurie Goodman
Marco Roos (LUMC)
Mark Thompson (LUMC)
Jun Zhao (Oxford)
Susanna Sansone (Oxford)
Philippe Rocca-Serra (Oxford)
Alejandra Gonzalez-Beltran (Oxford)
www.gigadb.org
galaxy.cbiit.cuhk.edu.hk
www.gigasciencejournal.com
CBIITFunding from:
Our collaborators:team: Case study:
DataData
Method/Experi
mental protocol
Method/Experi
mental protocol
FindingsFindings
Types of resources in an RO
Wfdesc/ISA-
TAB/ISA2OWL
Wfdesc/ISA-
TAB/ISA2OWL
Models to describe each resource type
The Big Picture

Más contenido relacionado

La actualidad más candente

Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck Todd Vision
 
Knowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, BonnKnowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, BonnTodd Vision
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the partsCarole Goble
 
The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...Todd Vision
 
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksResults Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksCarole Goble
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...Carole Goble
 
Research data and scholarly publications: going from casual acquaintances to ...
Research data and scholarly publications: going from casual acquaintances to ...Research data and scholarly publications: going from casual acquaintances to ...
Research data and scholarly publications: going from casual acquaintances to ...Todd Vision
 
Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceCarole Goble
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...Carole Goble
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use CasesCarole Goble
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Carole Goble
 
MESUR: Making sense and use of usage data
MESUR: Making sense and use of usage dataMESUR: Making sense and use of usage data
MESUR: Making sense and use of usage dataHerbert Van de Sompel
 
Metadata for Research Objects
Metadata for Research ObjectsMetadata for Research Objects
Metadata for Research Objectsseanb
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research ObjectsCarole Goble
 
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...GigaScience, BGI Hong Kong
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8Scott Edmunds
 
David Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published recordDavid Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published recordJisc
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...Robert Grossman
 
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...GigaScience, BGI Hong Kong
 

La actualidad más candente (20)

Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck
 
Knowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, BonnKnowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, Bonn
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the parts
 
The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...
 
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksResults Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
 
FAIRy Stories
FAIRy StoriesFAIRy Stories
FAIRy Stories
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...
 
Research data and scholarly publications: going from casual acquaintances to ...
Research data and scholarly publications: going from casual acquaintances to ...Research data and scholarly publications: going from casual acquaintances to ...
Research data and scholarly publications: going from casual acquaintances to ...
 
Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data Science
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use Cases
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017
 
MESUR: Making sense and use of usage data
MESUR: Making sense and use of usage dataMESUR: Making sense and use of usage data
MESUR: Making sense and use of usage data
 
Metadata for Research Objects
Metadata for Research ObjectsMetadata for Research Objects
Metadata for Research Objects
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
 
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 
David Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published recordDavid Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published record
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
 

Destacado

Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy
Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxyTin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy
Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxyGigaScience, BGI Hong Kong
 
SeRC: de novo assembly workshop. Francesco Vezzi
SeRC: de novo assembly workshop. Francesco VezziSeRC: de novo assembly workshop. Francesco Vezzi
SeRC: de novo assembly workshop. Francesco VezziFrancesco Vezzi
 
Channel Co-operation - A Distant Dream?
Channel Co-operation - A Distant Dream?Channel Co-operation - A Distant Dream?
Channel Co-operation - A Distant Dream?Richard Tubb
 
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...GigaScience, BGI Hong Kong
 
Balaji Engineering, Bengaluru, Acrylic Display Stand
Balaji Engineering, Bengaluru, Acrylic Display StandBalaji Engineering, Bengaluru, Acrylic Display Stand
Balaji Engineering, Bengaluru, Acrylic Display StandIndiaMART InterMESH Limited
 
Usha Die Casting Industries (Inds Eqpt Div)., Surat, Ventilation Fans
Usha Die Casting Industries (Inds Eqpt Div)., Surat, Ventilation FansUsha Die Casting Industries (Inds Eqpt Div)., Surat, Ventilation Fans
Usha Die Casting Industries (Inds Eqpt Div)., Surat, Ventilation FansIndiaMART InterMESH Limited
 
Regal Sales Corporation, Mumbai, Industrial Flanges
Regal Sales Corporation, Mumbai, Industrial FlangesRegal Sales Corporation, Mumbai, Industrial Flanges
Regal Sales Corporation, Mumbai, Industrial FlangesIndiaMART InterMESH Limited
 
Protech Engineering & Controls Pvt. Ltd., Mumbai, IDMT Relays
 Protech Engineering & Controls Pvt. Ltd., Mumbai, IDMT Relays Protech Engineering & Controls Pvt. Ltd., Mumbai, IDMT Relays
Protech Engineering & Controls Pvt. Ltd., Mumbai, IDMT RelaysIndiaMART InterMESH Limited
 
Scott Edmunds flashtalk slides from Beyond the PDF2
Scott Edmunds flashtalk slides from Beyond the PDF2Scott Edmunds flashtalk slides from Beyond the PDF2
Scott Edmunds flashtalk slides from Beyond the PDF2GigaScience, BGI Hong Kong
 

Destacado (11)

Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy
Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxyTin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy
Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy
 
SeRC: de novo assembly workshop. Francesco Vezzi
SeRC: de novo assembly workshop. Francesco VezziSeRC: de novo assembly workshop. Francesco Vezzi
SeRC: de novo assembly workshop. Francesco Vezzi
 
Channel Co-operation - A Distant Dream?
Channel Co-operation - A Distant Dream?Channel Co-operation - A Distant Dream?
Channel Co-operation - A Distant Dream?
 
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
 
Balaji Engineering, Bengaluru, Acrylic Display Stand
Balaji Engineering, Bengaluru, Acrylic Display StandBalaji Engineering, Bengaluru, Acrylic Display Stand
Balaji Engineering, Bengaluru, Acrylic Display Stand
 
Usha Die Casting Industries (Inds Eqpt Div)., Surat, Ventilation Fans
Usha Die Casting Industries (Inds Eqpt Div)., Surat, Ventilation FansUsha Die Casting Industries (Inds Eqpt Div)., Surat, Ventilation Fans
Usha Die Casting Industries (Inds Eqpt Div)., Surat, Ventilation Fans
 
Hakke Industries, Bengaluru, Industrial Racks
Hakke Industries, Bengaluru, Industrial RacksHakke Industries, Bengaluru, Industrial Racks
Hakke Industries, Bengaluru, Industrial Racks
 
Regal Sales Corporation, Mumbai, Industrial Flanges
Regal Sales Corporation, Mumbai, Industrial FlangesRegal Sales Corporation, Mumbai, Industrial Flanges
Regal Sales Corporation, Mumbai, Industrial Flanges
 
Protech Engineering & Controls Pvt. Ltd., Mumbai, IDMT Relays
 Protech Engineering & Controls Pvt. Ltd., Mumbai, IDMT Relays Protech Engineering & Controls Pvt. Ltd., Mumbai, IDMT Relays
Protech Engineering & Controls Pvt. Ltd., Mumbai, IDMT Relays
 
Concept Gains Pvt. Ltd., Gurgaon, GPS Tracker
Concept Gains Pvt. Ltd., Gurgaon, GPS TrackerConcept Gains Pvt. Ltd., Gurgaon, GPS Tracker
Concept Gains Pvt. Ltd., Gurgaon, GPS Tracker
 
Scott Edmunds flashtalk slides from Beyond the PDF2
Scott Edmunds flashtalk slides from Beyond the PDF2Scott Edmunds flashtalk slides from Beyond the PDF2
Scott Edmunds flashtalk slides from Beyond the PDF2
 

Similar a Scott Edmunds ISMB talk on Big Data Publishing

Scott Edmunds at OASP Asia: Open (and Big) Data – the next challenge
Scott Edmunds at OASP Asia: Open (and Big) Data – the next challengeScott Edmunds at OASP Asia: Open (and Big) Data – the next challenge
Scott Edmunds at OASP Asia: Open (and Big) Data – the next challengeGigaScience, BGI Hong Kong
 
Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting ...
Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting ...Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting ...
Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting ...GigaScience, BGI Hong Kong
 
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data PublishingScott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data PublishingGigaScience, BGI Hong Kong
 
Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynoteCarole Goble
 
The beauty of workflows and models
The beauty of workflows and modelsThe beauty of workflows and models
The beauty of workflows and modelsmyGrid team
 
BioMed Central's open data initiatives
BioMed Central's open data initiativesBioMed Central's open data initiatives
BioMed Central's open data initiativesiainh_z
 
Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Obj...
Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Obj...Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Obj...
Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Obj...GigaScience, BGI Hong Kong
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...GigaScience, BGI Hong Kong
 
Open Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureOpen Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureRoss Mounce
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Sciencedgarijo
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theoryC. Tobin Magle
 
Keynote speech - Carole Goble - Jisc Digital Festival 2015
Keynote speech - Carole Goble - Jisc Digital Festival 2015Keynote speech - Carole Goble - Jisc Digital Festival 2015
Keynote speech - Carole Goble - Jisc Digital Festival 2015Jisc
 
Scott Edmunds ICIS talk at UC Davis: Open Publishing for the Big Data era
Scott Edmunds ICIS talk at UC Davis: Open Publishing for the Big Data eraScott Edmunds ICIS talk at UC Davis: Open Publishing for the Big Data era
Scott Edmunds ICIS talk at UC Davis: Open Publishing for the Big Data eraGigaScience, BGI Hong Kong
 
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Bertram Ludäscher
 
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...The University of Edinburgh
 
Scott Edmunds at #GAMe2017: GigaGalaxy & publishing workflows for publishing ...
Scott Edmunds at #GAMe2017: GigaGalaxy & publishing workflows for publishing ...Scott Edmunds at #GAMe2017: GigaGalaxy & publishing workflows for publishing ...
Scott Edmunds at #GAMe2017: GigaGalaxy & publishing workflows for publishing ...GigaScience, BGI Hong Kong
 
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseAnita de Waard
 
Nicole Nogoy: GigaScience...how licensing can change the way we do research
Nicole Nogoy: GigaScience...how licensing can change the way we do researchNicole Nogoy: GigaScience...how licensing can change the way we do research
Nicole Nogoy: GigaScience...how licensing can change the way we do researchGigaScience, BGI Hong Kong
 
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...Jonathan Tedds
 

Similar a Scott Edmunds ISMB talk on Big Data Publishing (20)

Scott Edmunds at OASP Asia: Open (and Big) Data – the next challenge
Scott Edmunds at OASP Asia: Open (and Big) Data – the next challengeScott Edmunds at OASP Asia: Open (and Big) Data – the next challenge
Scott Edmunds at OASP Asia: Open (and Big) Data – the next challenge
 
Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting ...
Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting ...Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting ...
Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting ...
 
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data PublishingScott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
 
Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynote
 
The beauty of workflows and models
The beauty of workflows and modelsThe beauty of workflows and models
The beauty of workflows and models
 
BioMed Central's open data initiatives
BioMed Central's open data initiativesBioMed Central's open data initiatives
BioMed Central's open data initiatives
 
Nicole Nogoy at the Auckland BMC RoadShow
Nicole Nogoy at the Auckland BMC RoadShowNicole Nogoy at the Auckland BMC RoadShow
Nicole Nogoy at the Auckland BMC RoadShow
 
Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Obj...
Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Obj...Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Obj...
Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Obj...
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...
 
Open Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureOpen Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | Future
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
 
Keynote speech - Carole Goble - Jisc Digital Festival 2015
Keynote speech - Carole Goble - Jisc Digital Festival 2015Keynote speech - Carole Goble - Jisc Digital Festival 2015
Keynote speech - Carole Goble - Jisc Digital Festival 2015
 
Scott Edmunds ICIS talk at UC Davis: Open Publishing for the Big Data era
Scott Edmunds ICIS talk at UC Davis: Open Publishing for the Big Data eraScott Edmunds ICIS talk at UC Davis: Open Publishing for the Big Data era
Scott Edmunds ICIS talk at UC Davis: Open Publishing for the Big Data era
 
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
 
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
 
Scott Edmunds at #GAMe2017: GigaGalaxy & publishing workflows for publishing ...
Scott Edmunds at #GAMe2017: GigaGalaxy & publishing workflows for publishing ...Scott Edmunds at #GAMe2017: GigaGalaxy & publishing workflows for publishing ...
Scott Edmunds at #GAMe2017: GigaGalaxy & publishing workflows for publishing ...
 
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
 
Nicole Nogoy: GigaScience...how licensing can change the way we do research
Nicole Nogoy: GigaScience...how licensing can change the way we do researchNicole Nogoy: GigaScience...how licensing can change the way we do research
Nicole Nogoy: GigaScience...how licensing can change the way we do research
 
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
 

Más de GigaScience, BGI Hong Kong

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...GigaScience, BGI Hong Kong
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteGigaScience, BGI Hong Kong
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...GigaScience, BGI Hong Kong
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...GigaScience, BGI Hong Kong
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...GigaScience, BGI Hong Kong
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...GigaScience, BGI Hong Kong
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...GigaScience, BGI Hong Kong
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...GigaScience, BGI Hong Kong
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixGigaScience, BGI Hong Kong
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserGigaScience, BGI Hong Kong
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...GigaScience, BGI Hong Kong
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceGigaScience, BGI Hong Kong
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...GigaScience, BGI Hong Kong
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...GigaScience, BGI Hong Kong
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveGigaScience, BGI Hong Kong
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...GigaScience, BGI Hong Kong
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...GigaScience, BGI Hong Kong
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...GigaScience, BGI Hong Kong
 
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...GigaScience, BGI Hong Kong
 

Más de GigaScience, BGI Hong Kong (20)

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByte
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
 
Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
 
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...
 

Último

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Último (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Scott Edmunds ISMB talk on Big Data Publishing

  • 1. Big Data publishing Beyond dead trees, and a case study Scott Edmunds ISMB, 22nd July 2013 @gigascience
  • 2. The problems with publishing • Scholarly articles are merely advertisement of scholarship . The actual scholarly artefacts, i.e. the data and computational methods, which support the scholarship, remain largely inaccessible --- Jon B. Buckheit and David L. Donoho, WaveLab and reproducible research, 1995 • Core scientific statements or assertions are intertwined and hidden in the conventional scholarly narratives • Lack of transparency, lack of credit for anything other than “regular” dead tree publication
  • 3. Time to move beyond: 18121665 1869
  • 4. Problem: growing replication gap 1. Ioannidis et al., 2009. Repeatability of published microarray gene expression analyses. Nature Genetics 41: 14 2. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html 3. Bjorn Brembs: Open Access and the looming crisis in science https://theconversation.com/open-access-and-the-looming-crisis-in-science-14950 Out of 18 microarray papers, results from 10 could not be reproduced Out of 18 microarray papers, results from 10 could not be reproduced More retractions: >15X increase in last decade At current % > by 2045 as many papers published as retracted
  • 5. Motivation • Scholarly artefacts must be – Treated as first-class objects, in scientific investigations and in scholarly communications – Made machine-readable for the convenience of reasoning – Represented in an interoperable manner • Truly “add value” to publishing – Provide infrastructure to aid & reward replication • Trial of ISA+Nanopublication+RO – Three similar approaches that should complement each other for the representation of scholarly artifacts
  • 6. Motivation • Scholarly artefacts must be – Treated as first-class objects, in scientific investigations and in scholarly communications ? :“data* generated in the course of research are just as valuable to the ongoing academic discourse as papers and monographs”. “increase acceptance of research data* as legitimate, citable contributions to the scholarly record”. Data* Citation (*and more)
  • 7. • Data • Software • Re-use… = Credit}
  • 8. GigaSolution: deconstructing the paper Need to credit and reward: •Data/software availability •Metadata/curation •Interoperability •Availability of workflows •Transparent analyses Data Metadata Methods Analyses
  • 9. GigaSolution: deconstructing the paper www.gigadb.org www.gigasciencejournal.com 20PB storage, 20.5K cores, 212TFlops, >1000 bioinformaticians Utilizes big-data infrastructure and expertise from: Combines and integrates: Open-access journal Data Publishing Platform Data Analysis Platform
  • 10. What should we reward?
  • 11. Different levels of granularity: Experiment (e.g. Rice 10K project) Datasets (e.g. species, variety) Sample (e.g. specimen xyz) e.g. doi:10.5524/100001 e.g. doi:10.5524/100001-2 e.g. doi:10.5524/100001-2000 or doi:10.5524/100001_xyz Smaller still? Papers Data/ Micropubs NanopubsFacts/Assertions (~1014 in literature) Reward different shaped publishable objects
  • 13. Validationchecks Fail – submitter is provided error report Pass – dataset is uploaded to GigaDB. Submission Workflow Curator makes dataset public (can be set as future date if required) DataCite XML file Excel submission file Submitter logs in to GigaDB website and uploads Excel submission GigaDB DOI assigned Files Submitter provides files by ftp or Aspera XML is generated and registered with DataCite Curator Review Curator contacts submitter with DOI citation and to arrange file transfer (and resolve any other questions/issues). DOI 10.5524/100003 Genomic data from the crab-eating macaque/cynomolgus monkey (Macaca fascicularis) (2011) Public GigaDB dataset
  • 14. Reward open & transparent review End reviewer 3 Download parody videos, now!
  • 15. Real-time open-review = paper in arXiv + blogged reviews Reward open & transparent review http://tmblr.co/ZzXdssfOMJfywww.gigasciencejournal.com/content/2/1/10
  • 16. Cloud solutions? Reward better handling of metadata… Novel tools/formats for data interoperability/handling. BMC Research Awards 2013 Winner of open data award
  • 17. Rewarding transparent methods/results Software availability Open code Accessible pipelines Sharable workflows Research Objects Easeofreplication
  • 18. Implement workflows in a community-accepted format http://galaxyproject.org Over 36,000 main Galaxy server users Over 500 papers citing Galaxy use Over 55 Galaxy servers deployed Open source
  • 20. Research Objects An aggregation of scholarly artefacts: • Data used or results produced in an experiment study • Methods employed to produce and analyse that data • Provenance and setting information about the experiments • People involved in the investigation • Annotations about these resources, that are essential to the understanding and interpretation of the scientific outcomes captured by a research object.
  • 22.
  • 23. How are we supporting data reproducibility? Data sets Analyses Linked to Linked to DOI DOI Open-Paper Open-Review DOI:10.1186/2047-217X-1-18 >11000 accesses Open-Code 8 reviewers tested data in ftp server & named reports published DOI:10.5524/100044 Open-Pipelines Open-Workflows DOI:10.5524/100038 Open-Data 78GB CC0 data Code in sourceforge under GPLv3: http://soapdenovo2.sourceforge.net/>5000 downloads Enabled code to being picked apart by bloggers in wiki http://homolog.us/wiki/index.php?title=SOAPdenovo2
  • 24. 8 referees downloaded & tested data, then signed reports Reward open & transparent review
  • 25. Post publication: bloggers pull apart code/reviews in blogs + wiki: SOAPdenov2 wiki: http://homolog.us/wiki1/index.php?title=SOAPdenovo2 Homologus blogs: http://www.homolog.us/blogs/category/soapdenovo/ Reward open & transparent review
  • 26. SOAPdenovo2 workflows implemented in galaxy.cbiit.cuhk.edu.hk
  • 27. SOAPdenovo2 workflows implemented in galaxy.cbiit.cuhk.edu.hk Implemented entire workflow in our Galaxy server, inc.: • 3 pre-processing steps • 4 SOAPdenovo modules • 1 post processing steps • Evaluation and visualization tools Also will be available to download by >36K Galaxy users in
  • 28. How much further can we take this?
  • 29. How much further can we take this? ISA + RO + Nanopub case study Understand how each of the three models can support representation of the actual scholarly artefacts, which are essential first-class objects in scholarly communication Demonstrate added value to life science, publishing and scholarly communication communities on how these models should be used together to describe scholarly artefacts from life sciences domains
  • 30. Data models (instructions for authors for digital publishing) • Research Object – An encapsulation of essential information related to experiments and investigations • The ISA (Investigation + Study + Assay) framework – includes a format and a set of software tools that enable its international user community to provide rich description of the experimental workflows in life science, environmental and biomedical domains. • Nanopublication – Dissemination of individual data (assertions) with/without an accompanying scholarly articles – Enables attribution to the scientists for sharing these their data
  • 31. DataData Method/Experi mental protocol Method/Experi mental protocol FindingsFindings Types of resources in an RO Wfdesc/ISA- TAB/ISA2OWL Wfdesc/ISA- TAB/ISA2OWL Models to describe each resource type The Big Picture
  • 32. The SOAPdenovo2 Case study The Data The method, as a Galaxy workflow The findings, Table2 in the paper (doi:10.1186/2047-217X-1-18)
  • 33. investigation Investigation/Study/Assay infrastructure The investigation file is a high-level aggregator for related studies, contains all the information to understand the overall goals of an experiment, including investigators involved, associated publications, the experimental design, experimental factors, protocols, funding agencies and so on…
  • 34. investigation Investigation/Study/Assay infrastructure An investigation can have one or more studies. A study is the central unit of the experimental description and it contains information on the subject(s) under study, their characteristics, and any treatments applied. study study
  • 35. investigation Investigation/Study/Assay infrastructure Each study has one or more associated assays. The assay is the test performed either on the subject or on material taken from the subject, which produce qualitative and/or quantitative measurements. study study assay assay assay assay
  • 36. investigation Investigation/Study/Assay infrastructure study study assay assay assay assay data analysis method scriptdata data ISA
  • 38. investigation study Study Samples: A table rendering the sample collection workflow ISA framework SOAPdenovo2
  • 40. investigation study assay Representation available in: •Tabular format • Spreadsheet-like format • For biologists/experimentalists •RDF/OWL format for Semantic Web/Linked Data users • For bioinformaticians/software developers • Facilitating data integration, querying, reasoning •Support for submission to public repositories and data publication platforms •Tools support for curation, creation, storage, analysis… •Large and diverse life science user/collaborator communities ISA framework
  • 41. RO + ISA Scientific Workflow-specific ROs ISA experiment and data description Scientific, computational Experiments, non-wet lab protocols Scientific, computational Experiments, non-wet lab protocols Focus on web-lab or non- computational experimental protocols Focus on web-lab or non- computational experimental protocols
  • 42. An RO for the Case Study A Galaxy workflow A Galaxy workflow Some nanopub statements Some nanopub statements Input sequence data Input sequence data A Research Object The Research Object contains the following artefacts: • The inputs sequence data that are represented in ISA-TAB format • The Galaxy workflow that reflects the computational steps taken for generating the results used to produce Table 2 • Machine-readable descriptions about the workflow • The nanopublication statements that represents claims based on the content of Table 2 Descriptions about the workflow Descriptions about the workflow
  • 43. An RO for the Case Study
  • 44. Assertion Nanopublication URL Provenance PublicationInfo assertio n assertio n opm: was Derived From opm: was Derived From opm: wasGene- ratedBy opm: wasGene- ratedBy this nanopub this nanopub dcterms: created dcterms: created pav: authored- By pav: authored- By associa- tion associa- tion aa sio:statis- ticalAssoci ation sio:statis- ticalAssoci ation sio:has- measurem entValue sio:has- measurem entValue Associatio n_1_p_val ue Associatio n_1_p_val ue aa Sio:probab ility-value Sio:probab ility-value sio:has- value sio:has- value 6.56 e-5 ^^xsd:floa t 6.56 e-5 ^^xsd:floa t sio: refers-to sio: refers-to dcterms: DOI dcterms: DOI … Integrity KeyIntegrity Key An Individual association between concepts: •statement or declaration •measurement •hypothetical inference •quantitative or qualitative An Individual association between concepts: •statement or declaration •measurement •hypothetical inference •quantitative or qualitative Guarantee immutability after publication Guarantee immutability after publication Unique, persistent and resolvable identifier Unique, persistent and resolvable identifier How this assertion came to be, methods, evidence, context, etc. How this assertion came to be, methods, evidence, context, etc. • Detailed attribution for authors, institutions, lab technicians, curators • License info • Publication date • Detailed attribution for authors, institutions, lab technicians, curators • License info • Publication date
  • 45. A Nanopublication-Centric View • Improvements of SOAPdenovo2 have also been observed in assembling GAGE [8] dataset (see Additional file 1: Supplementary Method 6 and Tables 2 and 3). As  shown in Tables 2 and 3, the correct assembly length of SOAPdenovo2  increased by approximately 3 to 80-fold comparing with that of SOAPdenovo1.
  • 47. How do we generate a nanopub from this? …stay tuned for Tech Track talk #34 by Marco Roos ICC Lounge 81, Tuesday 23rd : 3.40pm-4.05pm
  • 48. Final step: visualizationFinal step: visualization NC_010079.pdf gi_161510924_ref_NC_010063.1_.pdf CONTIGuator 2 (thanks Marco Galardini)CONTIGuator 2 (thanks Marco Galardini) https://github.com/combogenomics/CONTIGuator
  • 49. Lessons learned: • Is possible to push button(s) & recreate a result from a paper • Reproducibility is COSTLY. How much are you willing to spend? • Learn a huge amount about the study, and provides lots of information not present in the paper • Much easier to do this before rather than after publication
  • 50. steps • Complete the case study on the release of ISA-OWL, nanopubs & ROs • Extend the case study by including more than one datasets or ROs, in order to show how related or conflicting information can be more easily interlinked • Create community guidelines on how these three models should be used together, e.g. recommended patterns or vocabulary terms
  • 51. www.gigasciencejournal.com Give us your data & pipelines!* Want to go beyond dead trees & the PDF? scott@gigasciencejournal.com editorial@gigasciencejournal.com database@gigasciencejournal.com Contact us: * APC’s currently generously covered by BGI in 2013
  • 52. Ruibang Luo (BGI/HKU) Shaoguang Liang (BGI-SZ) Tin-Lap Lee (CUHK) Qiong Luo (HKUST) Senghong Wang (HKUST) Yan Zhou (HKUST) Thanks to: @gigascience facebook.com/GigaScience blogs.openaccesscentral.com/blogs/gigablog/ Peter Li Huayan Gao Chris Hunter Jesse Si Zhe Nicole Nogoy Laurie Goodman Marco Roos (LUMC) Mark Thompson (LUMC) Jun Zhao (Oxford) Susanna Sansone (Oxford) Philippe Rocca-Serra (Oxford) Alejandra Gonzalez-Beltran (Oxford) www.gigadb.org galaxy.cbiit.cuhk.edu.hk www.gigasciencejournal.com CBIITFunding from: Our collaborators:team: Case study:
  • 53. DataData Method/Experi mental protocol Method/Experi mental protocol FindingsFindings Types of resources in an RO Wfdesc/ISA- TAB/ISA2OWL Wfdesc/ISA- TAB/ISA2OWL Models to describe each resource type The Big Picture

Notas del editor

  1. Over 20,000 users on the main server Over 500 papers citing the use of Galaxy Over 55 servers deployed on the Web
  2. The investigation file is a high-level aggregator for related studies, contains all the information to understand the overall goals of an experiment, including investigators involved, associated publications, the experimental design, experimental factors, protocols, funding agencies and so on… Here there is an example of some of the elements in the Investigation file for the SOAPdenovo2 investigation.
  3. An investigation can have one or more studies. A study is the central unit of the experimental description and it contains information on the subject(s) under study, their characteristics, and any treatments applied. In the SOAPdenovo2 case, there is a single study file, which describes the sample collection workflow. The elements can be associated with ontology terms. In the table shown, the source names are associated with a term from the NCBI Taxonomy to indicate their organism.
  4. Each study has one or more associated assays. The assay is the test performed either on the subject or on material taken from the subject, which produce qualitative and/or quantitative measurements. The assay file in the SOAPdenovo2 case describes the different protocols applied, the raw data and how it is processed. The assay files aggregates this information and points to the specific data/analysis methods/scripts, i.e. resources of different types. In the example, the assay file points to an FTP site with the data, to a table in the paper, to the workflow available in the Galaxy-CBIIT instance.
  5. The ISA representation is available as1) a tabular format (ISA-TAB), which is a spreadsheet-like format targeted for biologists/experimentalists. 2) an RDF representation (produced by the ISA2OWL project), following the semantic web/linked data approach. This representation is targeted to bioinformaticians/software developers, it facilitates the integration of data, rich querying and reasoning over the data. The ISA framework also has support for submission to public repositories, either by direct submission to databases that support the format (e.g. Metabolights, GIGA-DB) .
  6. That just leaves me to thank the GigaScience team: Laurie, Scott, Alexandra, Peter and Jesse, BGI for their support - specifically Shaoguang for IT and bioinformatics support – our collaborators on the database, website and tools: Tin-Lap, Qiong, Senhong, Yan, the Cogini web design team, Datacite for providing the DOI service and the isacommons team for their support and advocacy for best practice use of metadata reporting and sharing. Thank you for listening.