1. Streamlining Datasets Deposition to
Public Repository:
Biocrates/EMBL Metabolights case study
Philippe Rocca-Serra, Ph.D
University of Oxford e-Research Centre
philippe.rocca-serra@oerc.ox.ac.uk
2. Background
• Need for disclosing data supporting
publication works
• MSI CIMR annotation guidelines
• Data deposition mandated by Funders
• Creating fast, efficient deposition pipeline
– need to engage with various stakeholders
3. Targeted Metabolomics
• monitoring known sets of molecules = target
– either belonging to a same network or
representative of sets of pathways and used as
beacon for biological processes.
• Application to high-throughput phenotyping
– plasma
– urine
• Large cohorts eligible for the approach
– n>10000
4. The players (acknowledgement)
• Contact with ISA team in 2014, via Dr
Marta Cascante and Dr Silvia Marin.
• Biocrates: Introduction to Dr Bernd Haas
and Martin Buratti
• EMBL-EBI Metabolights: Reza Salek, Ken
Haug
• ISA Team at University of Oxford:
Alfie Adbul-Rahman
5. The components
• Biocrates MedIQ software, Boron release
– XML schema (biocrates xsd)
– declaration and description of key entities
• metabolite
• samples
• plate/well/injection runs
– quantitative measurements of metabolites
– NVT based description of Samples attributes
• XSL Transformation / Java processing
– conversion of study metadata to ISA-Tab format
– conversion of metabolite quantitation to MAF format
– ontology and CV mark-up
6. A streamlined deposition process
1. Export XML document from MetIDQ
2. Aspera Protocol upload to EMBL-EBI
( XML document & raw MS data)
3. Conversion and curation at EMBL-EBI
4. Minting of an EMBL Metabolights Accession Number
7. Pipeline Validation
• 3 Datasets currently being handled at EBI
– 147 samples (mouse, plasma)
– 7000 samples (human, plasma)
– 20000 samples (human, plasma & urine)
• Huge potential for meta-analysis and
dataset integration
• Lost opportunities of ‘invisible’ datasets
10. Take Home Message
• Data Custodians and Suppliers can work together
efficiently to avoid data loss.
• Data sets (large and small) deposition does not
have to be tasking.
• Engaging with platform vendors is highly beneficial
to the community.
– We need more of these interactions
– Vendors, Help your customer publish and share!
• ISA-Tab ensure visibility of your datasets
– get an EMBL Metabolights Accession Number
– get a DOI submitting your data article to NPG Scientific
Data
11.
12. A new category of publication that provides detailed
descriptors of scientifically valuable datasets,
associated or not to traditional article(s)