SlideShare una empresa de Scribd logo
1 de 82
Descargar para leer sin conexión
Metagenomic Data Provenance and 
Management using the ISA infrastructure 
overview, implementation patterns & software tools 
Alejandra ! 
Gonzalez-Beltran, PhD 
Eamonn ! 
Maguire 
! 
alejandra.gonzalezbeltran@oerc.ox.ac.uk 
eamonn.maguire@oerc.ox.ac.uk 
! 
! 
Metagenomics Bioinformatics, 
EMBL-EBI, Hinxton, UK 
September 2014 
University of Oxford e-Research Centre, UK
Experimental 
Metadata 
Roadmap
Experimental 
Metadata 
Roadmap
Experimental 
Metadata 
Roadmap 
link to analysis platforms
Experimental 
Metadata 
Roadmap 
link to analysis platforms 
submission to public 
repositories
Experimental 
Metadata 
Roadmap 
link to analysis platforms 
submission to public 
repositories
Experimental 
Metadata 
Roadmap 
link to analysis platforms 
submission to public 
repositories 
data publication
Experimental Metadata 
Notes in lab notebooks 
(information for humans) Spreadsheets & tables 
RDF statements 
(information for machines) 
It is all about structuring experimental information to make it available to 
computers and software agents to enable: 
8 
! 
provenance tracking 
assessment and evaluation 
accountability, reliability, trust, evidence 
conservation, preservation, storage, archiving and mining
9
http://www.ama-rochester.org/WP/wp-content/uploads/2013/01/three-pillars.png
The community
12 
A growing ecosystem of over 30 public and internal resources using 
the ISA metadata tracking framework (ISA-Tab and/or tools) to 
facilitate standards-compliant collection, curation, management and 
reuse of investigations in an increasingly diverse set of life science 
domains, including: 
! 
• stem cell discovery 
• system biology 
• transcriptomics 
• toxicogenomics 
• also by communities working to build a library of cellular 
signatures 
! 
• environmental health 
• environmental genomics 
• metabolomics 
• metagenomics 
• nanotechnology 
• proteomics
The format
Why ISA format and Tools? 
investigation 
assay(s) assay(s) 
pointers to data file 
names/location 
external files in 
native or other for-mats 
data data 
investigation 
high level concept to link 
related studies 
study 
the central unit, containing 
information on the subject 
under study, its characteristics 
and any treatments applied. 
a study has associated assays 
assay 
test performed either on 
material taken from the sub-ject 
or on the whole initial 
subject, which produce quali-tative 
or quantitative meas-urements 
(data) 
H. Sapiens 
H. Sapiens 
H. Sapiens 
H. Sapiens 
33 Years 
H1 
H1 
H2 
35 
35 
33 
Years 
Years 
Years 
ISA metadata specifications: 
! 
• workflow and process 
orientated 
• compatible with checklist 
enforcement 
• compatible with external 
vocabulary resources 
• compatible by design with 
existing schemas 
! 
H1.sample1 
H1.sample2 
H2.sample1 
Labeling 
Labeling 
H1.sample1.labeled 
H2.sample1.labeled 
h1-s1.cel 
h1-s2.cel 
h2-s1.cel 
H1 
H2 
H1.sample1 
H1.sample2 
H2.sample1 
Labeling 
Labeling 
H1.sample1.labeled 
H2.sample1.labeled 
h1-s1.cel 
h1-s2.cel 
h2-s1.cel 
H. Sapiens 
35 Years 
MAGE-Tab 
Pride-xml SRA-xml
Essentials about ISA syntax 
15 
• 3 types of files 
• Investigation file: at max 1 (think executive summary) 
–Why? general study description 
–How? methods / protocol declaration 
–How? variable declarations (factors and response variable) 
–Who? contact and affiliation information 
• Study File: true table (think sorting, filtering) 
–What? Listing all biological materials collected over the study course. 
• Assay File: true table (think sorting, filtering) 
–Results! Listing all data files collected by a given assay 
–n files, as many as there are assay types declared
Essentials about ISA syntax 
• Material Transformations: 
– Input and Outputs of Protocols are Material Nodes (Source Name, Sample Name, Extract Name, Labeled 
Extract Name.) 
Material Node 
Characteristics[…] 
Factor Value[…] (independent 
variables) 
Material Type 
Comment[…] 
Parameter Value 
! […] 
Performer (operator effect) 
Date (day effect) 
Material 
Protocol 
Process 
Data File Node 
! 
DATA Derived Data File 
Raw Data File 
! 
DATA 
! 
Material 
16
Basic coding patterns
Essentials about ISA syntax 
–Branching events: Tabular Representation 
Sample 
Material 
muscle 
biopsy 
liver 
biopsy 
human 
volunter 
1 
Source 
Name 
Characteris0c 
s[organism] 
Protocol 
REF 
Parameter 
Value[storage 
condi0on] 
Sample 
Name Characteris0cs[organ] 
volunteer 
1 Homo 
sapiens 
sample 
collec8on 
heparinated 
tube, 
room 
temperature 
volunteer 
1 
-­‐ 
sample1 peripheral 
blood 
volunteer 
1 Homo 
sapiens sample 
collec8on 
liquid 
nitrogen volunteer 
1 
-­‐ 
sample2 muscle 
volunteer 
1 Homo 
sapiens 
sample 
collec8on liquid 
nitrogen volunteer 
1 
-­‐ 
sample3 liver 
Source 
Material 
peripheral 
blood 
18
Essentials about ISA syntax 
–Pooling events: Tabular Representation 
Source 
Name 
Characteris0c 
s[organism] 
Protocol 
REF 
Parameter 
Value[storage 
condi0on] 
Sample 
Material 
Sample 
Name Characteris0cs[organ] 
animal 
1 Mus 
musculus 
sample 
collec8on 
heparinated 
tube, 
room 
temperature 
pool1 salivary 
gland 
animal 
2 Mus 
musculus sample 
collec8on 
heparinated 
tube, 
room 
temperature 
pool1 salivary 
gland 
animal 
3 Mus 
musculus 
sample 
collec8on 
heparinated 
tube, 
room 
temperature 
pool1 salivary 
gland 
animal 
1 
animal 
2 
animal 
3 
Source 
Material 
salivary 
glands 
19
Essentials about ISA syntax 
Tagging with Terminologies 
• Implicit column order matters: 
! 
! 
! 
! 
! 
! 
• ISA tools (ISAcreator - ISAconfigurator) provide Ontology 
term selection and term tagging facilities to help users. 
Source 
Name 
Characteris0cs 
[organism] 
Factor 
Value[comp 
ound 
agent] 
Factor 
Value[per 
turba0on 
agent] 
Factor 
Value[dose] 
Factor 
Value[dura 
0on] 
Factor 
Value[was 
hout 
period 
Factor 
Value[dura 
0on] 
Factor 
Value[perturba0o 
n 
agent] 
Factor 
Value[dose] Factor 
Value[dura0on] 
individual1 human 
Source 
Name 
Characteris0cs 
[organism] 
Term 
Source 
REF 
Term 
Accession 
Number 
Characteris0c 
s[dura0on] Unit 
Term 
Source 
REF 
Term 
Accession 
Number 
Factor 
Value[compound 
(htppt://purl] 
Term 
Source 
REF Term 
Accession 
Number 
individual1 Homo 
sapiens NCBITax 9606 12 week UO UO:wwerw 
ta 
aspirin CHEBI 1231354 
20
Experimental design and workflows
Parallel group design 
source: hOp://dx.doi.org/10.1016/S1569-­‐9056(02)00115-­‐X; figure 1 
22
Essentials about ISA syntax 
Representing interventions and treatments 
! 
• expressing treatments as sets of factor levels 
• examples: treatment is a tadalafil supplementation 
• Factors will be ‘compound’, ‘dose’ and duration 
• (what?, how much?, how long for?) 
! 
Characteris0c 
Factor 
! 
Source 
Name 
s[organism] 
Protocol 
REF 
Value[compoun 
Factor 
Value[dose] Factor 
Value[dura0on] 
d] 
! 
volunteer 
1 Homo 
sapiens treatment tadalafil 
250 
mg/day 12 
weeks 
! 
volunteer 
2 Homo 
sapiens treatment tadalafil 
250 
mg/day 12 
weeks 
! 
volunteer 
3 Homo 
sapiens treatment placebo 20 
mg/day 12 
weeks 
! 
• Implicit column order matters but this is independent from the ISA 
syntax specification
Cross-over design 
24 
source: Roberts et al. Journal of the International Society of Sports Nutrition 2007 4:25 doi:10.1186/1550-2783-4-25
08/26/13 
Cross-over design 
25 
10.1371/journal.pone.0037479
08/26/13 
Cross-over design 
26 
! 
Treatment 
declaration
08/26/13 
Cross-over design 
27 
10.1371/journal.pone.0037479
08/26/13 
Assays NMR 
28
08/26/13 
Assays NMR 
29
08/26/13 
Assays NMR 
30
The software suite
1
ISA configurations 
Available from: 
http://isa-tools.org/configurations.html 
https://github.com/ISA-tools/Configuration-Files 
• Assembling workflow archetypes 
• Setting annotation requirements 
–for compliance with database schemas (SRA, MAGE, PRIDE) 
–for compliance with community based requirements (MIAME, 
MIAPE, MIMS, MIxS, …) 
• Guide users 
–Provide pre-assembled templates 
–Specify vocabulary support 
ISAconfigurator: Supporting tool 
https://github.com/ISA-tools/ISAconfigurator
ISA configurations 
Available from: 
http://isa-tools.org/configurations.html 
https://github.com/ISA-tools/Configuration-Files 
• Minimum information about any (x) sequence (MIxS) Guidelines as 
issued by Genomic Standards Consortium 
• ENA-GSC-MIxS checklist XML document: 
–based on MIxS guidelines 
–augmented with a number of regular expressions to further validate/ 
regularize input 
–fixing a number of units used to report measurement 
–issued July 2013 (version 3.0), July 2014 (version 4.0) 
• SRA 1.5 schema requirements (mandatory information and required 
terminology, e.g. Library Selection or Library Strategy) 
• All this information is used to derive ISA MIxS configurations allowing all 
those annotation requirements to be embedded in spreadsheet tables
ISAconfigurator Tables
ISAconfigurator Tables
Things to bear in mind with NGS data 
Important considerations for managing data 
and submitting to public repositories 
–be aware of support file formats 
• FastA,FastQ,SFF,..... 
–be aware of the need to demultiplex reads 
–SRA schema evolves and updates are needed 
• e.g. Study replaced by Project 
• Updates to the ISAconverter 
• Mapping from ISA is straightforward as brings a 
number of element ISA already supported
Tools for creating ISA-Tab documents 
isacreator
isacreator 
Java desktop application 
Developed to be a user 
friendly way to enter 
standards-compliant 
metadata: it has lots of 
features... 
But these are just some of 
them… we also have a data 
entry wizard and an import 
utility...
ISAcreator features: automatic template generation
ISACreator Wizard: automatic template generation 
Prerequisites and Conditions of use: 
! 
-supports factorial design experiments, meaning sets of discrete factor levels 
combined together, to define a treatment 
2x2 factorial design as in 2 compounds and 2 time points 
2x2x3 factorial design as in 2 compounds, 2 time points, 2 doses 
-assumes one sample collection event (all samples collected at sacrifice time) 
-supports some but not all currently available assay types 
-supports fractional factorial design 
-supports unbalanced factor group population sizes (ethical considerations 
for high dose toxic exposures) 
-generates automatically sample identifiers, human readable & meaning full 
labels and , if requested, barcodes 
! 
-does not support ‘crossover design’, which have to be coded manually 
-does not support sample collection timeline management (under 
development)
43 Importing your own spreadsheet: 
Mapping to third party table
ISAcreator features: visualizing experimental workflows 
Work completed during investigation of new approach for creation of glyphs with use of taxonomy for 
guidance. See Maguire et al, Taxonomy-Based Glyph Design – with a Case Study on Visualizing 
Workflows of Biological Experiments, IEEE Transactions on Visualization and Computer Graphics, 2012 
44
OntoMaton: a BioPortal powered 
Ontology widget for Google Spreadsheets 
Maguire et al, 2013 
Bioinformatics 
Tools for creating ISA-Tab documents 
! 
! 
! 
! 
http://www.slideshare.net/proccaserra/ontomaton-icbo2013alternative-ordertwv3 
http://isatools.wordpress.com/2012/07/13/introducing-ontomaton-ontology-search-tagging- 
for-google-spreadsheets/
Potential Issues and known hurdles 
• The problem of conflicting versions 
–especially high when working with big consortia 
–distributed, decentralised groups of users 
• Lack of version control and history 
• Absence of collaborative features 
! 
–Looking for new solutions while retaining the 
features ! 
= + + 
LOV
Bioportal meets Google Spreadsheet 
47
Searching and Tagging 
Templates: 
https://drive.google.com/templates?type=spreadsheets&q=ontomaton
Searching and Tagging 
Templates: 
https://drive.google.com/templates?type=spreadsheets&q=ontomaton
50
2
3
Risa - ISA-Tab manipulation for analysis in R 
• RISA R-package 
53
• R"package"available"since"BioConductor"2.11" 
h:p://www.bioconductor.org/packages/release/bioc/html/Risa.html" 
• Func@onality"for"parsing"ISAFTab"datasets"into"R"objects," 
saving"and"upda@ng"them." 
• It"bridges"the"ISAFTab"metadata"to"analysis"pipelines"of" 
specific"assay"types,"by"building"objects"for"use"in"other"R" 
packages"downstream" 
– "currently"considering"mass"spectrometry"(xmcs"package,"xcmsSet)" 
and"DNA"microarray"(Biobase"package,"ExpressionSet)" 
" 
1 2 Collect Samples 3 4 Run Assays 
5 
Experiment Design Analysis 
54 
SAMPLE1 
SAMPLE2 
SAMPLE3 
SAMPLE4 
SAMPLE5 
SAMPLE6 
SAMPLE7 
SAMPLE8 
SAMPLE9 
SAMPLE10 
SAMPLE11 
SAMPLE 1 
SAMPLE 2 
SAMPLE 3 
SAMPLE 4 
SAMPLE 5 
SAMPLE 6 
SAMPLE 7 
SAMPLE 8 
SAMPLE 9 
SAMPLE 10 
SAMPLE 11 
FILE 1 
FILE 2 
FILE 3 
FILE 4 
FILE 5 
FILE 6 
FILE 7 
FILE 8 
FIL 
FIL 
FIL 
Arabidopsis thaliana 
Treatment groups 
70% 90% 100% 
6
http://isatools.wordpress.com/2013/065/158/isacreator-available-in-genomespace/
http://isatools.wordpress.com/2013/065/168/isacreator-available-in-genomespace/
http://isatools.wordpress.com/2013/065/178/isacreator-available-in-genomespace/
4
Submission Tool 
https://github.com/ISA-tools/ISAcreator/wiki/ENASubmissionTool 
59
Pre-requirements: 
– registration to ENA/EBI Metagenomics 
– data upload by one of the methods provided by ENA 
http://www.ebi.ac.uk/ena/about/sra_data_upload 
60
http://www.ebi.ac.uk/ena/about/sra_data_upload 
Pre-requirements: 
– registration to ENA/EBI Metagenomics 
– data upload by one of the methods provided by ENA 
61
https://github.com/ISA-tools/ISAcreator/wiki/ENASubmissionTool 
62
https://github.com/ISA-tools/ISAcreator/wiki/ENASubmissionTool 
63
64
65
66
67 
ISA-Tab 
validation 
ISA-Tab 
to 
SRA 
conversion 
Submission 
to ENA 
ISA-Tab 
creation 
(SRA-xml schema)
68
69
5
http://gigasciencejournal.com 
http://gigadb.org/dataset/100035
http://gigasciencejournal.com 
http://gigadb.org/dataset/100035
• New open-access, online-only publication for descriptions of scientifically valuable datasets 
• Only content type: Data Descriptor, narrative + structured parts 
• Initially focused on the life, environmental and biomedical sciences 
• Data Descriptor will be complementary to traditional research journals and data repositories 
• Designed to foster data sharing and reuse, and ultimately to accelerate scientific discovery 
www.nature.com/scientificdata
Data Descriptors served by Scientific Data 
Narrative Section! 
A brief article-like document like with:! 
•Title! 
•Abstract! 
•Background & Summary! 
•Methods! 
•Technical Validation! 
•Usage Notes ! 
•Figures & Tables ! 
•References 
Structured Section! 
Detailed descriptions of the experimental 
procedures used to produce the data 
•Following community-defined minimum 
information requirements 
• for a level of detail sufficient to reproduce the 
experiments 
•Using ontologies & controlled-vocabularies 
• To maximise consistency of the descriptions 
www.nature.com/scientificdata
Data Descriptors served by Scientific Data 
Narrative Section! 
A brief article-like document like with:! 
•Title! 
•Abstract! 
•Background & Summary! 
•Methods! 
•Technical Validation! 
•Usage Notes ! 
•Figures & Tables ! 
•References 
Structured Section! 
Detailed descriptions of the experimental 
procedures used to produce the data 
•Following community-defined minimum 
information requirements 
• for a level of detail sufficient to reproduce the 
experiments 
•Using ontologies & controlled-vocabularies 
• To maximise consistency of the descriptions 
www.nature.com/scientificdata
Training Material 
76 
http://isa-tools.org/training.html
http://isa-tools.org/training.html 
Hands-on Material 
• Software: 
–ISAcreator 1.7.8 (see pre-release) 
–ISAconfigurator 1.6 
• Configurations: 
–ISA-ENA-MIxS Configuration 
–default MultiAssay Configuration 
• ISA-Tab formatted datasets 
–BII-S-3: Western Channel Water Samples metagenome and 
meta transcriptome 
–BII-S-7: Human gut microbiome targeted gene survey 
• Google Templates and Ontomaton 
• ISA mapping file
The Exemplar Datasets 
• BII-­‐S-­‐3: 
• Metagenome 
and 
Metatranscriptome 
on 
454
• BII-­‐S-­‐7: 
The Exemplar Datasets 
SubmiOed 
to 
ENA 
via 
ISAcreator: 
ERP000133 
• Targeted 
Gene 
Survey 
(16s 
RNA) 
on 
454
Experimental 
Metadata 
Roadmap 
link to analysis platforms 
submission to public 
repositories 
data publication
ebiteams 
funders 
81
Thanks for your attention! 
Questions? 
You can email us... 
isatools@googlegroups.com 
View our websites 
View our Git repo & contribute 
http://github.com/ISA-tools 
View our blog 
http://isatools.wordpress.com 
Follow us on Twitter 
@isatools

Más contenido relacionado

La actualidad más candente

BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...Alejandra Gonzalez-Beltran
 
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...Alejandra Gonzalez-Beltran
 
Aspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceAspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceRaul Palma
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Carole Goble
 
GARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceGARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceDavid Johnson
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchEuropean Bioinformatics Institute
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use CasesCarole Goble
 
FAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use CaseFAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use CaseRothamsted Research, UK
 
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksResults Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksCarole Goble
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
 
Gene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialGene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialDmitry Grapov
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research ObjectsCarole Goble
 
ICAR 2015 Poster - Araport
ICAR 2015 Poster - AraportICAR 2015 Poster - Araport
ICAR 2015 Poster - AraportAraport
 

La actualidad más candente (20)

BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...
 
4A2B2C-2013
4A2B2C-20134A2B2C-2013
4A2B2C-2013
 
Drug Discovery- ELRIG -2012
Drug Discovery- ELRIG -2012Drug Discovery- ELRIG -2012
Drug Discovery- ELRIG -2012
 
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
 
Aspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceAspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth Science
 
ROHub
ROHubROHub
ROHub
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017
 
GARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceGARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant Science
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven Research
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use Cases
 
CV_10/17
CV_10/17CV_10/17
CV_10/17
 
FAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use CaseFAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use Case
 
DCC Keynote 2007
DCC Keynote 2007DCC Keynote 2007
DCC Keynote 2007
 
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksResults Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
Gene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialGene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -Tutorial
 
2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbio
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
 
Cshl minseqe 2013_ouellette
Cshl minseqe 2013_ouelletteCshl minseqe 2013_ouellette
Cshl minseqe 2013_ouellette
 
ICAR 2015 Poster - Araport
ICAR 2015 Poster - AraportICAR 2015 Poster - Araport
ICAR 2015 Poster - Araport
 

Destacado

Computational analysis of metagenomic data: delineation of compositional feat...
Computational analysis of metagenomic data: delineation of compositional feat...Computational analysis of metagenomic data: delineation of compositional feat...
Computational analysis of metagenomic data: delineation of compositional feat...Konrad Förstner
 
Phylogeny Driven Approaches to Genomic and Metagenomic Studies
Phylogeny Driven Approaches to Genomic and Metagenomic StudiesPhylogeny Driven Approaches to Genomic and Metagenomic Studies
Phylogeny Driven Approaches to Genomic and Metagenomic StudiesJonathan Eisen
 
The Emerging Global Collaboratory for Microbial Metagenomics Researchers
The Emerging Global Collaboratory for Microbial Metagenomics ResearchersThe Emerging Global Collaboratory for Microbial Metagenomics Researchers
The Emerging Global Collaboratory for Microbial Metagenomics ResearchersLarry Smarr
 
Microbial Metagenomics Drives a New Cyberinfrastructure
Microbial Metagenomics Drives a New CyberinfrastructureMicrobial Metagenomics Drives a New Cyberinfrastructure
Microbial Metagenomics Drives a New CyberinfrastructureLarry Smarr
 
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICS
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICSPROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICS
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICSLubna MRL
 
Dr. Ben Hause - Pathogen Discovery Using Metagenomic Sequencing
Dr. Ben Hause - Pathogen Discovery Using Metagenomic SequencingDr. Ben Hause - Pathogen Discovery Using Metagenomic Sequencing
Dr. Ben Hause - Pathogen Discovery Using Metagenomic SequencingJohn Blue
 
Advancing the Metagenomics Revolution
Advancing the Metagenomics RevolutionAdvancing the Metagenomics Revolution
Advancing the Metagenomics RevolutionLarry Smarr
 
Parks kmer metagenomics
Parks kmer metagenomicsParks kmer metagenomics
Parks kmer metagenomicsdparks1134
 
Viral Metagenomics (CABBIO 20150629 Buenos Aires)
Viral Metagenomics (CABBIO 20150629 Buenos Aires)Viral Metagenomics (CABBIO 20150629 Buenos Aires)
Viral Metagenomics (CABBIO 20150629 Buenos Aires)bedutilh
 
introduction to metagenomics
introduction to metagenomicsintroduction to metagenomics
introduction to metagenomicsThomas Haverkamp
 
Multiple kernel learning applied to the integration of Tara oceans datasets
Multiple kernel learning applied to the integration of Tara oceans datasetsMultiple kernel learning applied to the integration of Tara oceans datasets
Multiple kernel learning applied to the integration of Tara oceans datasetstuxette
 
2009 hattori metagenomics
2009 hattori metagenomics2009 hattori metagenomics
2009 hattori metagenomicsdrugmetabol
 
The Emerging Global Community of Microbial Metagenomics Researchers
The Emerging Global Community of Microbial Metagenomics ResearchersThe Emerging Global Community of Microbial Metagenomics Researchers
The Emerging Global Community of Microbial Metagenomics ResearchersLarry Smarr
 
[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomics[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomicsMads Albertsen
 

Destacado (19)

Computational analysis of metagenomic data: delineation of compositional feat...
Computational analysis of metagenomic data: delineation of compositional feat...Computational analysis of metagenomic data: delineation of compositional feat...
Computational analysis of metagenomic data: delineation of compositional feat...
 
Phylogeny Driven Approaches to Genomic and Metagenomic Studies
Phylogeny Driven Approaches to Genomic and Metagenomic StudiesPhylogeny Driven Approaches to Genomic and Metagenomic Studies
Phylogeny Driven Approaches to Genomic and Metagenomic Studies
 
The Emerging Global Collaboratory for Microbial Metagenomics Researchers
The Emerging Global Collaboratory for Microbial Metagenomics ResearchersThe Emerging Global Collaboratory for Microbial Metagenomics Researchers
The Emerging Global Collaboratory for Microbial Metagenomics Researchers
 
Microbial Metagenomics Drives a New Cyberinfrastructure
Microbial Metagenomics Drives a New CyberinfrastructureMicrobial Metagenomics Drives a New Cyberinfrastructure
Microbial Metagenomics Drives a New Cyberinfrastructure
 
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICS
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICSPROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICS
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICS
 
Dr. Ben Hause - Pathogen Discovery Using Metagenomic Sequencing
Dr. Ben Hause - Pathogen Discovery Using Metagenomic SequencingDr. Ben Hause - Pathogen Discovery Using Metagenomic Sequencing
Dr. Ben Hause - Pathogen Discovery Using Metagenomic Sequencing
 
Metagenomics
MetagenomicsMetagenomics
Metagenomics
 
Future of metagenomics
Future of metagenomicsFuture of metagenomics
Future of metagenomics
 
Phytobiomes
Phytobiomes Phytobiomes
Phytobiomes
 
Advancing the Metagenomics Revolution
Advancing the Metagenomics RevolutionAdvancing the Metagenomics Revolution
Advancing the Metagenomics Revolution
 
Metagenomic
MetagenomicMetagenomic
Metagenomic
 
Parks kmer metagenomics
Parks kmer metagenomicsParks kmer metagenomics
Parks kmer metagenomics
 
Viral Metagenomics (CABBIO 20150629 Buenos Aires)
Viral Metagenomics (CABBIO 20150629 Buenos Aires)Viral Metagenomics (CABBIO 20150629 Buenos Aires)
Viral Metagenomics (CABBIO 20150629 Buenos Aires)
 
introduction to metagenomics
introduction to metagenomicsintroduction to metagenomics
introduction to metagenomics
 
Multiple kernel learning applied to the integration of Tara oceans datasets
Multiple kernel learning applied to the integration of Tara oceans datasetsMultiple kernel learning applied to the integration of Tara oceans datasets
Multiple kernel learning applied to the integration of Tara oceans datasets
 
2009 hattori metagenomics
2009 hattori metagenomics2009 hattori metagenomics
2009 hattori metagenomics
 
metagenomics
metagenomicsmetagenomics
metagenomics
 
The Emerging Global Community of Microbial Metagenomics Researchers
The Emerging Global Community of Microbial Metagenomics ResearchersThe Emerging Global Community of Microbial Metagenomics Researchers
The Emerging Global Community of Microbial Metagenomics Researchers
 
[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomics[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomics
 

Similar a Metagenomic Data Provenance and Management using the ISA infrastructure --- overview, implementation patterns & software tools

ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, JapanISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, JapanPhilippe Rocca-Serra
 
Enhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort DataEnhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort DataBarry Smith
 
Oxford DTP - Sansone curation tools - Dec 2014
Oxford DTP - Sansone curation tools - Dec 2014Oxford DTP - Sansone curation tools - Dec 2014
Oxford DTP - Sansone curation tools - Dec 2014Susanna-Assunta Sansone
 
The Logical Model Designer - Binding Information Models to Terminology
The Logical Model Designer - Binding Information Models to TerminologyThe Logical Model Designer - Binding Information Models to Terminology
The Logical Model Designer - Binding Information Models to TerminologySnow Owl
 
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Susanna-Assunta Sansone
 
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
 Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ... Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...Syed Ahmad Chan Bukhari, PhD
 
Metadata for Research Objects
Metadata for Research ObjectsMetadata for Research Objects
Metadata for Research Objectsseanb
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectKen Karapetyan
 
100505 koenig biological_databases
100505 koenig biological_databases100505 koenig biological_databases
100505 koenig biological_databasesMeetika Gupta
 
eTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service PlatformeTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service Platformibemam
 
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...Susanna-Assunta Sansone
 
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...Alejandra Gonzalez-Beltran
 
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014Susanna-Assunta Sansone
 
RDA - Long Tail Data Interest Group - NPG Scientitic Data oveview
RDA - Long Tail Data Interest Group - NPG Scientitic Data oveviewRDA - Long Tail Data Interest Group - NPG Scientitic Data oveview
RDA - Long Tail Data Interest Group - NPG Scientitic Data oveviewSusanna-Assunta Sansone
 
Standarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesStandarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesYasset Perez-Riverol
 

Similar a Metagenomic Data Provenance and Management using the ISA infrastructure --- overview, implementation patterns & software tools (20)

COPO kick-off meeting
COPO kick-off meetingCOPO kick-off meeting
COPO kick-off meeting
 
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, JapanISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
 
Enhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort DataEnhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort Data
 
Oxford DTP - Sansone curation tools - Dec 2014
Oxford DTP - Sansone curation tools - Dec 2014Oxford DTP - Sansone curation tools - Dec 2014
Oxford DTP - Sansone curation tools - Dec 2014
 
Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...
 
The Logical Model Designer - Binding Information Models to Terminology
The Logical Model Designer - Binding Information Models to TerminologyThe Logical Model Designer - Binding Information Models to Terminology
The Logical Model Designer - Binding Information Models to Terminology
 
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
 
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
 Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ... Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
 
Metadata for Research Objects
Metadata for Research ObjectsMetadata for Research Objects
Metadata for Research Objects
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
 
100505 koenig biological_databases
100505 koenig biological_databases100505 koenig biological_databases
100505 koenig biological_databases
 
eTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service PlatformeTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service Platform
 
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
 
A Guide for Reproducible Research
A Guide for Reproducible ResearchA Guide for Reproducible Research
A Guide for Reproducible Research
 
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
 
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
 
BioSD Tutorial 2014 Editition
BioSD Tutorial 2014 EdititionBioSD Tutorial 2014 Editition
BioSD Tutorial 2014 Editition
 
RDA - Long Tail Data Interest Group - NPG Scientitic Data oveview
RDA - Long Tail Data Interest Group - NPG Scientitic Data oveviewRDA - Long Tail Data Interest Group - NPG Scientitic Data oveview
RDA - Long Tail Data Interest Group - NPG Scientitic Data oveview
 
Standarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesStandarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata files
 

Más de Alejandra Gonzalez-Beltran

Más de Alejandra Gonzalez-Beltran (11)

The Software Sustainability Institute Fellowship
The Software Sustainability Institute FellowshipThe Software Sustainability Institute Fellowship
The Software Sustainability Institute Fellowship
 
CMSO Minimal reporting requirements
CMSO Minimal reporting requirementsCMSO Minimal reporting requirements
CMSO Minimal reporting requirements
 
The DATS model: datasets descriptions for data discovery in DataMed
The DATS model: datasets descriptions for data discovery in DataMedThe DATS model: datasets descriptions for data discovery in DataMed
The DATS model: datasets descriptions for data discovery in DataMed
 
Datasets with bioschemas
Datasets with bioschemasDatasets with bioschemas
Datasets with bioschemas
 
Data publication: Discover, Explore, Visualise
Data publication: Discover, Explore, VisualiseData publication: Discover, Explore, Visualise
Data publication: Discover, Explore, Visualise
 
ISA commons - overview and latest developments
ISA commons - overview and latest developmentsISA commons - overview and latest developments
ISA commons - overview and latest developments
 
Metadata for Interoperable Bioscience
Metadata for Interoperable BioscienceMetadata for Interoperable Bioscience
Metadata for Interoperable Bioscience
 
Metadata challenges research and re-usable data - BioSharing, ISA and STATO
Metadata challenges research and re-usable data - BioSharing, ISA and STATOMetadata challenges research and re-usable data - BioSharing, ISA and STATO
Metadata challenges research and re-usable data - BioSharing, ISA and STATO
 
Brazil-UK Frontiers of Engineering - Big data in healthcare session
Brazil-UK Frontiers of Engineering - Big data in healthcare sessionBrazil-UK Frontiers of Engineering - Big data in healthcare session
Brazil-UK Frontiers of Engineering - Big data in healthcare session
 
BCU 2013
BCU 2013BCU 2013
BCU 2013
 
SELENfest 2012
SELENfest 2012SELENfest 2012
SELENfest 2012
 

Metagenomic Data Provenance and Management using the ISA infrastructure --- overview, implementation patterns & software tools

  • 1. Metagenomic Data Provenance and Management using the ISA infrastructure overview, implementation patterns & software tools Alejandra ! Gonzalez-Beltran, PhD Eamonn ! Maguire ! alejandra.gonzalezbeltran@oerc.ox.ac.uk eamonn.maguire@oerc.ox.ac.uk ! ! Metagenomics Bioinformatics, EMBL-EBI, Hinxton, UK September 2014 University of Oxford e-Research Centre, UK
  • 4. Experimental Metadata Roadmap link to analysis platforms
  • 5. Experimental Metadata Roadmap link to analysis platforms submission to public repositories
  • 6. Experimental Metadata Roadmap link to analysis platforms submission to public repositories
  • 7. Experimental Metadata Roadmap link to analysis platforms submission to public repositories data publication
  • 8. Experimental Metadata Notes in lab notebooks (information for humans) Spreadsheets & tables RDF statements (information for machines) It is all about structuring experimental information to make it available to computers and software agents to enable: 8 ! provenance tracking assessment and evaluation accountability, reliability, trust, evidence conservation, preservation, storage, archiving and mining
  • 9. 9
  • 12. 12 A growing ecosystem of over 30 public and internal resources using the ISA metadata tracking framework (ISA-Tab and/or tools) to facilitate standards-compliant collection, curation, management and reuse of investigations in an increasingly diverse set of life science domains, including: ! • stem cell discovery • system biology • transcriptomics • toxicogenomics • also by communities working to build a library of cellular signatures ! • environmental health • environmental genomics • metabolomics • metagenomics • nanotechnology • proteomics
  • 14. Why ISA format and Tools? investigation assay(s) assay(s) pointers to data file names/location external files in native or other for-mats data data investigation high level concept to link related studies study the central unit, containing information on the subject under study, its characteristics and any treatments applied. a study has associated assays assay test performed either on material taken from the sub-ject or on the whole initial subject, which produce quali-tative or quantitative meas-urements (data) H. Sapiens H. Sapiens H. Sapiens H. Sapiens 33 Years H1 H1 H2 35 35 33 Years Years Years ISA metadata specifications: ! • workflow and process orientated • compatible with checklist enforcement • compatible with external vocabulary resources • compatible by design with existing schemas ! H1.sample1 H1.sample2 H2.sample1 Labeling Labeling H1.sample1.labeled H2.sample1.labeled h1-s1.cel h1-s2.cel h2-s1.cel H1 H2 H1.sample1 H1.sample2 H2.sample1 Labeling Labeling H1.sample1.labeled H2.sample1.labeled h1-s1.cel h1-s2.cel h2-s1.cel H. Sapiens 35 Years MAGE-Tab Pride-xml SRA-xml
  • 15. Essentials about ISA syntax 15 • 3 types of files • Investigation file: at max 1 (think executive summary) –Why? general study description –How? methods / protocol declaration –How? variable declarations (factors and response variable) –Who? contact and affiliation information • Study File: true table (think sorting, filtering) –What? Listing all biological materials collected over the study course. • Assay File: true table (think sorting, filtering) –Results! Listing all data files collected by a given assay –n files, as many as there are assay types declared
  • 16. Essentials about ISA syntax • Material Transformations: – Input and Outputs of Protocols are Material Nodes (Source Name, Sample Name, Extract Name, Labeled Extract Name.) Material Node Characteristics[…] Factor Value[…] (independent variables) Material Type Comment[…] Parameter Value ! […] Performer (operator effect) Date (day effect) Material Protocol Process Data File Node ! DATA Derived Data File Raw Data File ! DATA ! Material 16
  • 18. Essentials about ISA syntax –Branching events: Tabular Representation Sample Material muscle biopsy liver biopsy human volunter 1 Source Name Characteris0c s[organism] Protocol REF Parameter Value[storage condi0on] Sample Name Characteris0cs[organ] volunteer 1 Homo sapiens sample collec8on heparinated tube, room temperature volunteer 1 -­‐ sample1 peripheral blood volunteer 1 Homo sapiens sample collec8on liquid nitrogen volunteer 1 -­‐ sample2 muscle volunteer 1 Homo sapiens sample collec8on liquid nitrogen volunteer 1 -­‐ sample3 liver Source Material peripheral blood 18
  • 19. Essentials about ISA syntax –Pooling events: Tabular Representation Source Name Characteris0c s[organism] Protocol REF Parameter Value[storage condi0on] Sample Material Sample Name Characteris0cs[organ] animal 1 Mus musculus sample collec8on heparinated tube, room temperature pool1 salivary gland animal 2 Mus musculus sample collec8on heparinated tube, room temperature pool1 salivary gland animal 3 Mus musculus sample collec8on heparinated tube, room temperature pool1 salivary gland animal 1 animal 2 animal 3 Source Material salivary glands 19
  • 20. Essentials about ISA syntax Tagging with Terminologies • Implicit column order matters: ! ! ! ! ! ! • ISA tools (ISAcreator - ISAconfigurator) provide Ontology term selection and term tagging facilities to help users. Source Name Characteris0cs [organism] Factor Value[comp ound agent] Factor Value[per turba0on agent] Factor Value[dose] Factor Value[dura 0on] Factor Value[was hout period Factor Value[dura 0on] Factor Value[perturba0o n agent] Factor Value[dose] Factor Value[dura0on] individual1 human Source Name Characteris0cs [organism] Term Source REF Term Accession Number Characteris0c s[dura0on] Unit Term Source REF Term Accession Number Factor Value[compound (htppt://purl] Term Source REF Term Accession Number individual1 Homo sapiens NCBITax 9606 12 week UO UO:wwerw ta aspirin CHEBI 1231354 20
  • 22. Parallel group design source: hOp://dx.doi.org/10.1016/S1569-­‐9056(02)00115-­‐X; figure 1 22
  • 23. Essentials about ISA syntax Representing interventions and treatments ! • expressing treatments as sets of factor levels • examples: treatment is a tadalafil supplementation • Factors will be ‘compound’, ‘dose’ and duration • (what?, how much?, how long for?) ! Characteris0c Factor ! Source Name s[organism] Protocol REF Value[compoun Factor Value[dose] Factor Value[dura0on] d] ! volunteer 1 Homo sapiens treatment tadalafil 250 mg/day 12 weeks ! volunteer 2 Homo sapiens treatment tadalafil 250 mg/day 12 weeks ! volunteer 3 Homo sapiens treatment placebo 20 mg/day 12 weeks ! • Implicit column order matters but this is independent from the ISA syntax specification
  • 24. Cross-over design 24 source: Roberts et al. Journal of the International Society of Sports Nutrition 2007 4:25 doi:10.1186/1550-2783-4-25
  • 25. 08/26/13 Cross-over design 25 10.1371/journal.pone.0037479
  • 26. 08/26/13 Cross-over design 26 ! Treatment declaration
  • 27. 08/26/13 Cross-over design 27 10.1371/journal.pone.0037479
  • 32.
  • 33. 1
  • 34. ISA configurations Available from: http://isa-tools.org/configurations.html https://github.com/ISA-tools/Configuration-Files • Assembling workflow archetypes • Setting annotation requirements –for compliance with database schemas (SRA, MAGE, PRIDE) –for compliance with community based requirements (MIAME, MIAPE, MIMS, MIxS, …) • Guide users –Provide pre-assembled templates –Specify vocabulary support ISAconfigurator: Supporting tool https://github.com/ISA-tools/ISAconfigurator
  • 35. ISA configurations Available from: http://isa-tools.org/configurations.html https://github.com/ISA-tools/Configuration-Files • Minimum information about any (x) sequence (MIxS) Guidelines as issued by Genomic Standards Consortium • ENA-GSC-MIxS checklist XML document: –based on MIxS guidelines –augmented with a number of regular expressions to further validate/ regularize input –fixing a number of units used to report measurement –issued July 2013 (version 3.0), July 2014 (version 4.0) • SRA 1.5 schema requirements (mandatory information and required terminology, e.g. Library Selection or Library Strategy) • All this information is used to derive ISA MIxS configurations allowing all those annotation requirements to be embedded in spreadsheet tables
  • 38. Things to bear in mind with NGS data Important considerations for managing data and submitting to public repositories –be aware of support file formats • FastA,FastQ,SFF,..... –be aware of the need to demultiplex reads –SRA schema evolves and updates are needed • e.g. Study replaced by Project • Updates to the ISAconverter • Mapping from ISA is straightforward as brings a number of element ISA already supported
  • 39. Tools for creating ISA-Tab documents isacreator
  • 40. isacreator Java desktop application Developed to be a user friendly way to enter standards-compliant metadata: it has lots of features... But these are just some of them… we also have a data entry wizard and an import utility...
  • 41. ISAcreator features: automatic template generation
  • 42. ISACreator Wizard: automatic template generation Prerequisites and Conditions of use: ! -supports factorial design experiments, meaning sets of discrete factor levels combined together, to define a treatment 2x2 factorial design as in 2 compounds and 2 time points 2x2x3 factorial design as in 2 compounds, 2 time points, 2 doses -assumes one sample collection event (all samples collected at sacrifice time) -supports some but not all currently available assay types -supports fractional factorial design -supports unbalanced factor group population sizes (ethical considerations for high dose toxic exposures) -generates automatically sample identifiers, human readable & meaning full labels and , if requested, barcodes ! -does not support ‘crossover design’, which have to be coded manually -does not support sample collection timeline management (under development)
  • 43. 43 Importing your own spreadsheet: Mapping to third party table
  • 44. ISAcreator features: visualizing experimental workflows Work completed during investigation of new approach for creation of glyphs with use of taxonomy for guidance. See Maguire et al, Taxonomy-Based Glyph Design – with a Case Study on Visualizing Workflows of Biological Experiments, IEEE Transactions on Visualization and Computer Graphics, 2012 44
  • 45. OntoMaton: a BioPortal powered Ontology widget for Google Spreadsheets Maguire et al, 2013 Bioinformatics Tools for creating ISA-Tab documents ! ! ! ! http://www.slideshare.net/proccaserra/ontomaton-icbo2013alternative-ordertwv3 http://isatools.wordpress.com/2012/07/13/introducing-ontomaton-ontology-search-tagging- for-google-spreadsheets/
  • 46. Potential Issues and known hurdles • The problem of conflicting versions –especially high when working with big consortia –distributed, decentralised groups of users • Lack of version control and history • Absence of collaborative features ! –Looking for new solutions while retaining the features ! = + + LOV
  • 47. Bioportal meets Google Spreadsheet 47
  • 48. Searching and Tagging Templates: https://drive.google.com/templates?type=spreadsheets&q=ontomaton
  • 49. Searching and Tagging Templates: https://drive.google.com/templates?type=spreadsheets&q=ontomaton
  • 50. 50
  • 51. 2
  • 52. 3
  • 53. Risa - ISA-Tab manipulation for analysis in R • RISA R-package 53
  • 54. • R"package"available"since"BioConductor"2.11" h:p://www.bioconductor.org/packages/release/bioc/html/Risa.html" • Func@onality"for"parsing"ISAFTab"datasets"into"R"objects," saving"and"upda@ng"them." • It"bridges"the"ISAFTab"metadata"to"analysis"pipelines"of" specific"assay"types,"by"building"objects"for"use"in"other"R" packages"downstream" – "currently"considering"mass"spectrometry"(xmcs"package,"xcmsSet)" and"DNA"microarray"(Biobase"package,"ExpressionSet)" " 1 2 Collect Samples 3 4 Run Assays 5 Experiment Design Analysis 54 SAMPLE1 SAMPLE2 SAMPLE3 SAMPLE4 SAMPLE5 SAMPLE6 SAMPLE7 SAMPLE8 SAMPLE9 SAMPLE10 SAMPLE11 SAMPLE 1 SAMPLE 2 SAMPLE 3 SAMPLE 4 SAMPLE 5 SAMPLE 6 SAMPLE 7 SAMPLE 8 SAMPLE 9 SAMPLE 10 SAMPLE 11 FILE 1 FILE 2 FILE 3 FILE 4 FILE 5 FILE 6 FILE 7 FILE 8 FIL FIL FIL Arabidopsis thaliana Treatment groups 70% 90% 100% 6
  • 58. 4
  • 60. Pre-requirements: – registration to ENA/EBI Metagenomics – data upload by one of the methods provided by ENA http://www.ebi.ac.uk/ena/about/sra_data_upload 60
  • 61. http://www.ebi.ac.uk/ena/about/sra_data_upload Pre-requirements: – registration to ENA/EBI Metagenomics – data upload by one of the methods provided by ENA 61
  • 64. 64
  • 65. 65
  • 66. 66
  • 67. 67 ISA-Tab validation ISA-Tab to SRA conversion Submission to ENA ISA-Tab creation (SRA-xml schema)
  • 68. 68
  • 69. 69
  • 70. 5
  • 73. • New open-access, online-only publication for descriptions of scientifically valuable datasets • Only content type: Data Descriptor, narrative + structured parts • Initially focused on the life, environmental and biomedical sciences • Data Descriptor will be complementary to traditional research journals and data repositories • Designed to foster data sharing and reuse, and ultimately to accelerate scientific discovery www.nature.com/scientificdata
  • 74. Data Descriptors served by Scientific Data Narrative Section! A brief article-like document like with:! •Title! •Abstract! •Background & Summary! •Methods! •Technical Validation! •Usage Notes ! •Figures & Tables ! •References Structured Section! Detailed descriptions of the experimental procedures used to produce the data •Following community-defined minimum information requirements • for a level of detail sufficient to reproduce the experiments •Using ontologies & controlled-vocabularies • To maximise consistency of the descriptions www.nature.com/scientificdata
  • 75. Data Descriptors served by Scientific Data Narrative Section! A brief article-like document like with:! •Title! •Abstract! •Background & Summary! •Methods! •Technical Validation! •Usage Notes ! •Figures & Tables ! •References Structured Section! Detailed descriptions of the experimental procedures used to produce the data •Following community-defined minimum information requirements • for a level of detail sufficient to reproduce the experiments •Using ontologies & controlled-vocabularies • To maximise consistency of the descriptions www.nature.com/scientificdata
  • 76. Training Material 76 http://isa-tools.org/training.html
  • 77. http://isa-tools.org/training.html Hands-on Material • Software: –ISAcreator 1.7.8 (see pre-release) –ISAconfigurator 1.6 • Configurations: –ISA-ENA-MIxS Configuration –default MultiAssay Configuration • ISA-Tab formatted datasets –BII-S-3: Western Channel Water Samples metagenome and meta transcriptome –BII-S-7: Human gut microbiome targeted gene survey • Google Templates and Ontomaton • ISA mapping file
  • 78. The Exemplar Datasets • BII-­‐S-­‐3: • Metagenome and Metatranscriptome on 454
  • 79. • BII-­‐S-­‐7: The Exemplar Datasets SubmiOed to ENA via ISAcreator: ERP000133 • Targeted Gene Survey (16s RNA) on 454
  • 80. Experimental Metadata Roadmap link to analysis platforms submission to public repositories data publication
  • 82. Thanks for your attention! Questions? You can email us... isatools@googlegroups.com View our websites View our Git repo & contribute http://github.com/ISA-tools View our blog http://isatools.wordpress.com Follow us on Twitter @isatools