SlideShare una empresa de Scribd logo
1 de 27
Descargar para leer sin conexión
MS (and NMR) data standards in Metabolomics
why, how and some caveats
Steffen Neumann
Leibniz Institute of Plant Biochemistry
ScienceCampus Halle (WCH)
June 23, 2014
S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014
Metabolomics – The Pipeline
S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014
IPB machine Park
Data processing from
LC-QqTOF-MS:
QStar Pulsar i, microTOF Q
Bruker Apex (FTICR)
HCT Ultra (IT-MS, CID+ETD)
Reflex III (Maldi-TOF)
Thermo Finnigan
Quantum Ultra AM, LCQ Deca XP
S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014
netCDF: Grandfather is still alive
netCDF as file format, ANDI-MS as content specification
fine for GC/MS and simple LC/MS
widely supported in software and programming languages
no mix of MS and MS/MS
very poor metadata
Defined in Standard: “ASTM E1947 – 98(2009)
Standard Specification for Analytical Data Inter-
change Protocol for Chromatographic Data”
available for only $42
S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014
netCDF in action: I
§ ¤
1 dataset_completeness = "C1+C2" ;
2 ms_template_revision = "1.0.1" ;
3 dataset_origin = "PE−SCIEX" ;
4 experiment_date_time_stamp = "20050928190327+0100" ;
5 operator_name = "SYSTEM" ;
6 source_file_reference = "d:tt4_4_1.wiff " ;
7 source_file_format = "PE−SCIEX Wiff version 1" ;
8 experiment_type = "Continuum Mass Spectrum" ;
9 test_separation_type = "Normal Phase Liquid Chromatography" ;
0 test_ms_inlet = "Electrospray Inlet " ;
1 test_ms_inlet_temperature = 20.f ;
2 test_ionization_mode = "Electrospray Ionization" ;
3 test_ionization_polarity = "Positive Polarity " ;
4 test_detector_type = "Electron Multiplier " ;
5 test_resolution_type = "Constant Resolution" ;
6 test_scan_function = "Mass Scan" ;
7 test_scan_direction = "Up" ;
8 test_scan_law = "Linear" ;
9 actual_run_time_length = 3480.54 ;
¦ ¥
S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014
netCDF in action: II
§ ¤
1
2 scan_acquisition_time = 2.00100016593933, 4.00200033187866,
3 6.00100040435791, 8.00100040435791, 10.0040006637573, 12.0020008087158,
4 14.0020008087158, 16.0020008087158, 18.0040016174316, 20.0020008087158,
5
6 total_intensity = 10541, 10640, 10697, 10455, 10707, 10554, 10612, 10434,
7 10738, 10504, 10567, 10646, 10675, 10660, 10676, 10638, 10498, 10581,
8 10655, 10843, 10650, 10703, 10792, 10667, 10564, 10732, 10613, 10766,
9
0 mass_values = 106.0288, 106.038, 106.0564, 106.061, 106.0656, 106.0702,
1 106.0748, 106.0794, 106.0931, 106.9725, 106.9771, 106.9817, 106.9863,
2 106.9909, 106.9955, 107.0001, 107.0047, 107.0094, 107.014, 107.0324,
¦ ¥
S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014
More metadata in XML: mzData
Proteome Standards Initiative
Raw / Measurement Data:
Mass Spec Equipment
Software
(Raw) Peaks
Isolation windows,
collision energies, . . .
Vendor Support: Bruker, Applied
Biosystems, Kratos Analytical,
Matrix Science, . . .
“Competitor”: mzXML
S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014
mzData + mzXML = mzML
mzData
1.05
mzXML
3.0
mzML
0.90
SFO
2006-05
dataXML
0.6
DC
2006-09
Lyon
2007-04
EBI
2007-06
mzML
0.91
PSI Doc Proc
2007-11
mzML
0.99 RC
Toledo
2008-04
mzML
1.0.0
Release!
2008-06
Early Development Final Development
mzML
1.1.0RC5
Turku
2009-04
mzML
1.1.0
Release!
2009-06
HUPO-PSI
More stable than mzXML
Better defined than mzData
Reference implementations
Early vendor involvement
mzML
run
spectrum
spectrumDescription
binaryDataArray
binaryDataArray
• • •
precursorList
scan
spectrumList
• • •
spectrum
spectrum
cvList
referenceableParamGroupList
sampleList
acquisitionSettingsList
dataProcessingList
softwareList
instrumentConfigurationList
chromatogramList
• • •
chromatogram
chromatogram
chromatogram
binaryDataArray
binaryDataArray
S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014
Martens, Chambers, Sturm, Kessner, Levander, Shofstahl, Tang, Römpp, Neumann, Pizarro, Montecchi-
Palazzi, Tasman, Coleman, Reisinger, Souda, Hermjakob, Binz, Deutsch. mzML–a community standard for
mass spectrometry data. Mol Cell Proteomics. (2011)
mzML in action: I
§ ¤
1 <mzML >
2 <cv id="MS" fullName="PSI MS Vocabularies" />
3 <cv id="UO" fullName="unit" />
4
5 <fileContent>
6 <cvParam cv="MS" name="MS1 spectrum"/>
7 <cvParam cv="MS" name="MSn spectrum"/>
8 <cvParam cv="MS" name="centroid spectrum"/>
9 </fileContent>
0
1 <sourceFile id="sourceFile" location="C:/MSMSpos15_MM48_1_2−18485.d/analysis.baf">
2 <cvParam cv="MS" name="Bruker BAF file"/>
3 <cvParam cv="MS" name="SHA−1" value="4ef...7c0"/>
4 </sourceFile>
¦ ¥
S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014
mzML in action: II
§ ¤
1 <software id="exportSoftware" version="3.0.5">
2 <cvParam cv="MS" name="CompassXport"/>
3 </software>
4 <software id="recalibrationSoftware" version="4.0.234.0">
5 </software>
6
7 <instrumentConfiguration id="instrument">
8 <cvParam cv="MS" name="micrOTOF−Q"/>
9 </instrumentConfiguration>
0
1 <dataProcessing id="export">
2 <processingMethod order="1" softwareRef="instrumentSoftware">
3 <cvParam cv="MS" accession="MS:1000035" name="peak picking"/>
4 </processingMethod>
5 <processingMethod order="2" softwareRef="recalibrationSoftware">
6 <cvParam cv="MS" name="m/z calibration"/>
7 </processingMethod>
8 <processingMethod order="3" softwareRef="exportSoftware">
9 <cvParam cv="MS" name="Conversion to mzML"/>
0 </processingMethod>
1 </dataProcessing>
¦ ¥
S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014
mzML in action: your data
§ ¤
1 <spectrum id="scan=16" >
2 <cvParam cv="MS" name="positive scan"/>
3 <cvParam cv="MS" name="MS2 spectrum"/>
4 <cvParam cv="MS" name="centroid spectrum"/>
5 <precursor>
6 <cvParam cv="MS" name="selected ion m/z" value="542.1" unitName="m/z"/>
7 <activation>
8 <cvParam cv="MS" name="collision energy" value="15.0" unitName="electronvolt"/>
9 <cvParam cv="MS" name="low−energy collision−induced dissociation"/>
0 </ activation >
1 </precursor>
2 <binaryData>
3 <cvParam cv="MS" name="zip compression"/>
4 <cvParam cv="MS" name="m/z array" unitName="m/z"/>
5 <binary>eNrj/luT+KC02sEswyJj5...doaB42HsdAItdCw4=</binary>
6 </binaryDat>
7 <binaryData>
8 <cvParam cv="MS" name="zip compression"/>
9 <cvParam cv="MS" name="intensity array" unitName="counts"/>
0 <binary>eNpjYACCBkcHBjCwhdKWD...gAXvgH4</binary>
1 </binaryDataArray>
2 </spectrum>
¦ ¥
S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014
www.openms.de
Originally for MS-based Proteomics
Reads mzData, mzXML, mzML
NetCDF (Not on 64bit!)
FileInfo, FileConverter, FileFilter, ...
plus Calibration, Merge, NoiseFilter, . . .
TOPPView Viewer and GUI
⇒ Very useful for preprocessing
S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014
M. Sturm, A. Bertsch, C. Gröpl, A. Hildebrandt, R. Hussong, E. Lange, N. Pfeifer, O. Schulz-Trieglaff, A. Zerck,
K. Reinert, O. Kohlbacher, 2008. OpenMS – an Open-Source Software Framework for Mass Spectrometry
BMC Bioinformatics doi:10.1186/1471-2105-9-163.
http://proteowizard.sourceforge.net/
Originally for MS-based Proteomics
cross-platform (MSVC on Windows, gcc on Linux, XCode on OSX)
open source (Apache v2)
Formats supported on all platforms: mzML, mzXML, MGF
Formats supported on Windows with vendor libraries installed:
Thermo RAW, Waters RAW, Bruker FID/YEP/BAF
msconvert: conversion tool.
msdiff: validation of conversion/preprocessing
msaccess: command line access:binary data and metadata,
EICs & pseudo-2D gel image creation
SeeMS: interactive viewer for mass spec data files (Windows only)
S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014
Chambers, Maclean, Burke, Amodei, Ruderman, Neumann, Gatto, Fischer, Pratt, Egertson, Hoff, Kessner,
Tasman, Shulman, Frewen, Baker, Brusniak, Paulse, Creasy, Flashner, Kani, Moulding, Seymour, Nuwaysir,
Lefebvre, Kuhlmann, Roark, Rainer, Gerd, Hemenway, Huhmer, Langridge, Eckels, Connolly, Stearns,
Deutsch, Katz, Agus, MacCoss, Tabb, Mallick. A cross-platform toolkit for mass spectrometry and proteomics.
Converters: Notes
https://xcmsonline.scripps.edu/docs/fileformats.html
Bruker:
Calibration requires setting a specific Registry Key:
HKEY_CURRENT_USERSoftwareBruker DaltonikCompassXport
UseRecalibratedSpectra=1
Waters:
No support for calibration in Waters DLL used by msconvert
DataBridge writes netCDF only, and writes calibrated data
Ancient massWolf requires full MassLynx installed, will use
calibrated data, but intermingle LockMass Scans
S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014
Plumbing: libraries for mzML
pymzML (Python) http://pymzml.github.io/
jmzML (Java) https://code.google.com/p/jmzml/
OpenMS (C++) https://www.openms.de/
Proteowizard (C++) http://proteowizard.sourceforge.net/
mzR (R/Bioconductor) http://www.bioconductor.org/packages/
release/bioc/html/mzR.html
. . . and many more!
S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014
MS and Metabolomics in BioC
Collection of biology-related R packages
Started back in 2002
Current release: >500 packages!
Package Maintainer Title
mzR Gatto,me,Fischer parser for netCDF, mzXML, mzData and mzML
xcms Ralf Tautenhahn LC/MS and GC/MS Data Analysis
MassSpecWavelet Pan Du Mass spectrum processing by wavelet-based algorithms
CAMERA Carsten Kuhl Collection of Annotation related MEthods for mass spectRometry dAta
Rdisop Steffen Neumann Decomposition of Isotopic Patterns
MSnbase Laurent Gatto Base Functions and Classes for MS-based Proteomics
iontree Mingshu Cao Data management and analysis of ion trees from ion-trap MS
rpubchem Rajarshi Guha Interface to the PubChem Collection
KEGGSOAP R. Gentleman client interface to the KEGG SOAP server
apComplex D. Scholtens Estimate protein complex membership using AP-MS protein data
PROcess X. Li Ciphergen SELDI-TOF Processing
simulatorAPMS Tony Chiang Computationally simulates the AP-MS technology.
TargetSearch Cuadros-Inostroza et al. analysis of GC-MS metabolite profiling data.
flagme Mark Robinson Analysis of Metabolomics GC/MS Data
S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014
LC-MS Data preprocessing with XCMS
www.bioconductor.org
Import: netCDF, mzXML,
mzData, mzML
Peak detection
Peak alignment
Peak integration
“Differential” metabolites
Compatible with all
MS instruments at the IPB
S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014
Lange, Tautenhahn, Neumann, Gröpl. Critical assessment of alignment procedures for LC-MS proteomics and
metabolomics measurements. BMC Bioinformatics (2008)
FTICR Peak Picking
Bioconductor Package
“MassSpecWavelet”
Integration into XCMS:
Same Annotation
and Identification
Same statistics
(Same database schema)
380 381 382 383 384
0e+002e+064e+06
a) MS raw spectrum
m/z value
Intensity
b) CWT coefficients
m/z value
CWTcoefficientscale
380 381 382 383 384
158111723
380 381 382 383 384
0e+002e+064e+06
c) Identified peaks with SNR > 3
m/z value
Intensity
S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014
Projektarbeit Sebastian Wolf & Michael Gerlich: Du, Kibbe, Lin: Peak Detection of Mass Spectrometry Spec-
trum by Continuous Wavelet Transform based Pattern Matching, Bioinformatics (2008)
Plumbing: mzR for MS raw data
New in BioC 2.10 (Oct 2011)
Joint work Fischer/Gatto/Neumann
Conglomerate of former XCMS code, ISB Ramp,
Proteowizard via Rcpp
Read netCDF, mzXML, mzData, mzML (mz5 soon ?)
Read mzIdentML mzQuantML one day ?
To become the affyIO of MS data ?!
GSoC project 2014 to improve mzR
mzR
mzRramp
mzRpwiz
mzRnetCDF
S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014
Chambers, Maclean, Burke, Amodei, Ruderman, Neumann, Gatto, Fischer, Pratt, Egertson, Hoff, Kessner,
Tasman, Shulman, Frewen, Baker, Brusniak, Paulse, Creasy, Flashner, Kani, Moulding, Seymour, Nuwaysir,
Lefebvre, Kuhlmann, Roark, Rainer, Gerd, Hemenway, Huhmer, Langridge, Eckels, Connolly, Stearns,
Deutsch, Katz, Agus, MacCoss, Tabb, Mallick. A cross-platform toolkit for mass spectrometry and proteomics.
imzML: imaging mass spectrometry in mzML
Huge data files,
complex access patterns
imzML: same ’ol mzML,
but base64 in 2nd data file
Some new CV terms
faster access
7/8 space reduction
lossless mzML imzML
http://www.imzml.org
⇒ Open MS imaging software!
S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014
Schramm T, Hester A, Klinkert I, Both J-P, Heeren RMA, Brunelle A, Laprévote O, Desbenoit N, Robbe M-
F, Stoeckli M, Spengler B, Römpp A (2012) imzML — A common data format for the flexible exchange and
processing of mass spectrometry imaging data. J. of Proteomics 10.1016/j.jprot.2012.07.026
mz5: netCDF meets mzML
Convert from XML to HDF5
HDF5: big cousin of netCDF
Pros:
size reduction 54%
read/write speed 3–4-fold
Fully implemented in pwiz
HDF5 API for most
languages
Cons:
Not human-readable
Kills emacs and wordpad
S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014
mz5: Space- and Time-efficient Storage of Mass Spectrometry Data Sets
M. Wilhelm, M. Kirchner, J. Steen, H Steen, MCP 10.1074/mcp.O111.011379
Focus of standards in NMR
D2.6Metadata
ISAtab
D2.4Raw data
nmrML
Metabolite
Identification
mzTab
Metabolite
Quantification
mzTab
Capture NMR raw data (equivalent to mzML)
Ingredients for nmrML standard:
●
XML Schema and controlled vocabulary (CV)
●
Examples, converters and validation suite
●
COSMOS partners involved:
IPB, EMBL-EBI, UB2, UBHAM, UOXF,
IMPERIAL, MRC, Mike Wilson (Canada),
Matthias Klein (D), Ian Lewis (US)
New format: nmrML D2.4Raw data
nmrML
github.org as development platform
●
Web site with content management
http://nmrml.org/
●
Version control system,
Issue tracker, activity statistics
●
Free for open source projects
nmrML infrastructure D2.4Raw data
nmrML
●
Controlled vocabulary developed
as OWL ontology
●
Based on earlier work
by MSI, D. Rubtsov and J.Cruz
●
ISAtab can leverage ontologies
●
With semantic web / RDF / SparQL
in mind for later deliverables
nmrML Ontology D2.4Raw data
nmrML
The need for an open nmr standard
nmrML: an XML-based open standard for
NMR data storage and exchange
NMR data is currently accumulating in local data silos, hindering distribution and secondary data usage. Cross platform NMR data access, integration and
comparison is hindered by incompatible vendor formats and the lack of a robust vendor-agnostic NMR data standard. Data in proprietary data formats
ages fast, posing the danger of irreproducible data from older studies. An open vendor-neutral storage standard is needed as long-term archival format,
if emerging metabolomics repositories are to capture data from all vendor formats in a persistent way, yet supporting the dynamics in this field.
To ease format conversions we deliver parsers
for Bruker and Varian data formats, which can be
incorporated into open NMR processing and
analysis software.
Parsers
Although coverage is good at raw data capture, the XSD and CV will be expanded for
better processed data and quantification data. Our standard is accepted by major
open source nmr data processing tools and will serve the MetaboLights repository with a
stable storage format.
Daniel Schober 1, Michael Wilson2, Daniel Jacob3, Annick Moing3, Catherine Deborde3, Luis de Figueiredo4, Kenneth Haug4,
Philippe Rocca-Serra5, John Easton6, Christian Ludwig7, Antonio Rosato8, David Wishart2, Christoph Steinbeck4, Reza Salek4, Steffen Neumann1
1Leibniz Institute of Plant Biochemistry, Dept. of Stress and Developmental Biology, Weinberg 3, 06120 Halle, Germany
2Department of Computing/Biological Sciences, University of Alberta, Edmonton, Canada
3INRA, Univ. Bordeaux, Metabolome Facility of Bordeaux Functional Genomics Center, 71 av Edouard Bourlaux, F-33140 Villenave d’Ornon, France
4European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
5University of Oxford, e-Research Centre, 7 Keble Road, Oxford, OX1 3QG, UK
6School of Electronic, Electrical and Computer Engineering, University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK
7School of Cancer Sciences, University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK
8Magnetic Resonance Center (CERM), University of Florence, 50019 Sesto Fiorentino (FI), Italy
nmrML XML schema excerpt nmrML data example nmrML use cases
The COordination of Standards in MetabOlomicS, COSMOS EU consortium has teamed up with the
metabolomics standards initiative to create an open exchange and storage format for NMR
data. We largely follow design principles already established in the Proteomics Standards Initiative
(PSI) for the mzML data standard for mass spectrometry. The standard is composed of an XML
schema (nmrML.xsd) and an accompanying controlled vocabulary (nmrCV.owl), which ensures
update flexibility and schema robustness by allowing to outsource more variant and dynamic
descriptors into the vocabulary which is referenced from within an nmrML file.
•Website: http://www.nmrML.org
•Github: https://github.com/nmrML/nmrML
•nmrML validator: http://msbi.ipb-halle.de/nmrML/index.php
•Cosmos: http://www.cosmos-fp7.eu/
•Email: info@nmrml.org
•Google Group: https://groups.google.com/forum/?hl=en#!forum/nmrml/join
Data from a paper: Farag, M., Porzel, A., Schmidt, J. & Wessjohann, L. Metabolite profiling and
fingerprinting of commercial cultivars of Humulus lupulus L. (hop) - a comparision of MS and
NMR methods in metabolomics, Metabolomics 8, 492-507, (2012)
<nmrML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://nmrml.org/schema ../../../xml-schemata/nmrML.xsd"
xmlns="http://nmrml.org/schema" version="1.0.0">
<cvList count="2">
<cv fullName="nmrML Controlled Vocabulary" version="0.0.1" id="NMRCV"
URI="http://www.nmrml.org/nmrml-cv.0.0.1.owl"/>
<cv fullName="Unit Ontology" version="3.2.0" id="UO" URI="http://unit-
ontology.googlecode.com/svn/trunk/uo.owl/"/>
</cvList>
<contactList>
<contact id="ID004" fullname="Lutger A. Wessjohann" email="Ludger.Wessjohann [a] ipb-
halle.de"/>
<contact id="ID044" fullname="Mohamed A. Farag" email="mfarag73 [a] yahoo.com"/>
</contactList>
<sourceFileList count="2">
<sourceFile sha1="fd99c095046e2356c7d31154d45353fa79cbc844"
location=file:///Users/mike/Projects/nmrML/nmrML/examples/IPB_HopExample/FIDs/FAM013_
AHTM.PROTON_04.fid/procpar
id="SOURCE_FILE_0" name="procpar">
<cvTerm cvRef="NMRCV" accession="NMR:1400297" name="Varian VNMR Format"/>
<cvTerm cvRef="NMRCV" accession="NMR:1002006" name="acquisition parameter file"/>
</sourceFile>
<sourceFile sha1="e4ffeb41da28b1e9017e72819252ec6d78f8179f“
location=
file:///Users/mike/Projects/nmrML/nmrML/examples/IPB_HopExample/FIDs/FAM013_AHTM.PROTON_04.fid/fi
d
id="SOURCE_FILE_1" name="fid">
<cvTerm cvRef="NMRCV" accession="NMR:1400297" name="Varian VNMR Format"/>
<cvTerm cvRef="NMRCV" accession="NMR:1400119" name="FID file"/>
</sourceFile>
</sourceFileList>
<softwareList count="1">
<software cvRef="NMRCV" accession="NMR:1000277" name="VnmrJ software" version="2.2C"
id="SOFTWARE_1"/>
</so<instrumentConfigurationList count="4">
<instrumentConfiguration id="INST_CONFIG_1">
<cvTerm cvRef="NMRCV" accession="NMR:1400234" name="Varian NMR instrument"/>
<cvTerm cvRef="NMRCV" accession="NMR:1000235" name="Varian probe"/>
<cvTerm cvRef="NMRCV" accession="NMR:1400234" name="Varian NMR instrument"/>
<cvTerm cvRef="NMRCV" accession="NMR:1000236" name="5mm HCN probe"/>
</instrumentConfiguration>
</instrumentConfigurationList>
<acquisition>
<acquisition1D>
<acquisitionParameterSet numberOfScans="160" numberOfSteadyStateScans="0">
<sampleAcquisitionTemperature unitName="kelvin" unitCvRef="UO" value="299.15"
unitAccession="UO:0000012"/>
<spinningRate unitName="hertz" unitCvRef="UO" value="0" unitAccession="UO:0000106"/>
<relaxationDelay unitName="second" unitCvRef="UO" value="22.2737024"
unitAccession="UO:0000010"/>
<pulseSequence/>
<DirectDimensionParameterSet numberOfDataPoints="65536" decoupled="false">
<acquisitionNucleus cvRef="NMRCV" accession="NMR:1400151" name="1H"/>
<gammaB1PulseFieldStrength unitName="hertz" unitCvRef="UO" value="34482.7586207"
unitAccession="UO:0000106"/>
<irradiationFrequency unitName="hertz" unitCvRef="UO" value="599.8311617"
unitAccession="UO:0000106"/>
</DirectDimensionParameterSet>
</acquisitionParameterSet>
<fidData byteFormat="Complex128" encodedLength="324160"
compressed="true">eJwMl4dfzl8Ux7U3lYZKy0qiomQ […]</fidData>
</acquisition1D>
</acquisition>
</nmrML>
ftwareList>
MetaboLights
The nmrML setup
We also deliver a content validator which checks a data file is syntactically well formatted, sufficiently complete and that aspects of minimal information
requirements like the Core Information for Metabolomics Reporting (CIMR) are met.
Validators
Outlook Project resources
nmrML setup
•MetaboLights: http://www.ebi.ac.uk/metabolights/
•MSI: http://msi-workgroups.sourceforge.net/
•CIMR-MI: http://mibbi.sourceforge.net/projects/CIMR.shtml
Validation Layer Onion Validation webservice & resultValidation rules (html)
My pleas for the future
. . . to the vendors:
Please start (or continue!) to support Open Data formats
. . . to the computational mass spec community:
Please use (and improve!) joint data I/O libraries
. . . to YOU (the users):
Please start (or continue!) to REQUEST
open formats when inviting to bid for a new instrument
S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014

Más contenido relacionado

Destacado

Igv e ipm 2010
Igv e ipm  2010Igv e ipm  2010
Igv e ipm 2010
COEECI
 
Millorem la redacció (1)
Millorem la redacció (1)Millorem la redacció (1)
Millorem la redacció (1)
Antònia Travé
 
Equipo 3 innovación tecnológica , internet ,códigos QR
Equipo 3   innovación tecnológica , internet ,códigos QREquipo 3   innovación tecnológica , internet ,códigos QR
Equipo 3 innovación tecnológica , internet ,códigos QR
oscarsilvaaula51
 
Signos precoces básicos del diagnostico y neurobiología del tdah preescolar
Signos precoces básicos del diagnostico y neurobiología del tdah preescolarSignos precoces básicos del diagnostico y neurobiología del tdah preescolar
Signos precoces básicos del diagnostico y neurobiología del tdah preescolar
Fundación CADAH TDAH
 
Work management of sm es
Work management of sm esWork management of sm es
Work management of sm es
Simone Santos
 
Opeb and prefunding
Opeb and prefundingOpeb and prefunding
Opeb and prefunding
taatla
 
Carpentier la cultura de los pueblos que habitan en las tierras del mar caribe
Carpentier   la cultura de los pueblos que habitan en las tierras del mar caribeCarpentier   la cultura de los pueblos que habitan en las tierras del mar caribe
Carpentier la cultura de los pueblos que habitan en las tierras del mar caribe
Lapiscina
 

Destacado (18)

La historia de internet . terminado
La historia de internet . terminadoLa historia de internet . terminado
La historia de internet . terminado
 
Neix Santana
Neix SantanaNeix Santana
Neix Santana
 
Intelligent Networks ATMS
Intelligent Networks ATMSIntelligent Networks ATMS
Intelligent Networks ATMS
 
Cruz roja y proyecto alter
Cruz roja y proyecto alterCruz roja y proyecto alter
Cruz roja y proyecto alter
 
Igv e ipm 2010
Igv e ipm  2010Igv e ipm  2010
Igv e ipm 2010
 
UN DIA DE PLUJA TEXT LLIURE
UN DIA DE PLUJA TEXT LLIUREUN DIA DE PLUJA TEXT LLIURE
UN DIA DE PLUJA TEXT LLIURE
 
Reggeaton
ReggeatonReggeaton
Reggeaton
 
Millorem la redacció (1)
Millorem la redacció (1)Millorem la redacció (1)
Millorem la redacció (1)
 
Equipo 3 innovación tecnológica , internet ,códigos QR
Equipo 3   innovación tecnológica , internet ,códigos QREquipo 3   innovación tecnológica , internet ,códigos QR
Equipo 3 innovación tecnológica , internet ,códigos QR
 
Google vn dday-fin
Google vn dday-finGoogle vn dday-fin
Google vn dday-fin
 
Portafolioeducativo Alma Navarro1
Portafolioeducativo Alma Navarro1Portafolioeducativo Alma Navarro1
Portafolioeducativo Alma Navarro1
 
Turismo y sustentabilidad ambiental en formosa.
Turismo y sustentabilidad ambiental en formosa.Turismo y sustentabilidad ambiental en formosa.
Turismo y sustentabilidad ambiental en formosa.
 
Signos precoces básicos del diagnostico y neurobiología del tdah preescolar
Signos precoces básicos del diagnostico y neurobiología del tdah preescolarSignos precoces básicos del diagnostico y neurobiología del tdah preescolar
Signos precoces básicos del diagnostico y neurobiología del tdah preescolar
 
Fichero esp 4to
Fichero esp 4toFichero esp 4to
Fichero esp 4to
 
Work management of sm es
Work management of sm esWork management of sm es
Work management of sm es
 
Opeb and prefunding
Opeb and prefundingOpeb and prefunding
Opeb and prefunding
 
Quiz sobre el fútbol fran serrano
Quiz sobre el fútbol fran serranoQuiz sobre el fútbol fran serrano
Quiz sobre el fútbol fran serrano
 
Carpentier la cultura de los pueblos que habitan en las tierras del mar caribe
Carpentier   la cultura de los pueblos que habitan en las tierras del mar caribeCarpentier   la cultura de los pueblos que habitan en las tierras del mar caribe
Carpentier la cultura de los pueblos que habitan en las tierras del mar caribe
 

Similar a MS (and NMR) data standards in Metabolomics why, how and some caveats

Pittcon06 Auto Chrom
Pittcon06 Auto ChromPittcon06 Auto Chrom
Pittcon06 Auto Chrom
niharaina
 
Prediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source toolsPrediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source tools
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
A comparison of three chromatographic retention time prediction models
A comparison of three chromatographic retention time prediction modelsA comparison of three chromatographic retention time prediction models
A comparison of three chromatographic retention time prediction models
Andrew McEachran
 
In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...
Kamel Mansouri
 

Similar a MS (and NMR) data standards in Metabolomics why, how and some caveats (20)

Deep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpointsDeep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpoints
 
EFFICIENT USE OF HYBRID ADAPTIVE NEURO-FUZZY INFERENCE SYSTEM COMBINED WITH N...
EFFICIENT USE OF HYBRID ADAPTIVE NEURO-FUZZY INFERENCE SYSTEM COMBINED WITH N...EFFICIENT USE OF HYBRID ADAPTIVE NEURO-FUZZY INFERENCE SYSTEM COMBINED WITH N...
EFFICIENT USE OF HYBRID ADAPTIVE NEURO-FUZZY INFERENCE SYSTEM COMBINED WITH N...
 
Applications of Artificial Neural Networks in Cancer Prediction
Applications of Artificial Neural Networks in Cancer PredictionApplications of Artificial Neural Networks in Cancer Prediction
Applications of Artificial Neural Networks in Cancer Prediction
 
Medical Image Segmentation Using Hidden Markov Random Field A Distributed Ap...
Medical Image Segmentation Using Hidden Markov Random Field  A Distributed Ap...Medical Image Segmentation Using Hidden Markov Random Field  A Distributed Ap...
Medical Image Segmentation Using Hidden Markov Random Field A Distributed Ap...
 
Pittcon06 Auto Chrom
Pittcon06 Auto ChromPittcon06 Auto Chrom
Pittcon06 Auto Chrom
 
AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Obser...
AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Obser...AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Obser...
AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Obser...
 
1_chlamydia task completely best.docx
1_chlamydia task completely best.docx1_chlamydia task completely best.docx
1_chlamydia task completely best.docx
 
Accelerating GWAS epistatic interaction analysis methods
Accelerating GWAS epistatic interaction analysis methodsAccelerating GWAS epistatic interaction analysis methods
Accelerating GWAS epistatic interaction analysis methods
 
ODVSML_Presentation
ODVSML_PresentationODVSML_Presentation
ODVSML_Presentation
 
Hidalgo jairo, yandun marco 595
Hidalgo jairo, yandun marco 595Hidalgo jairo, yandun marco 595
Hidalgo jairo, yandun marco 595
 
Prediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source toolsPrediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source tools
 
A comparison of three chromatographic retention time prediction models
A comparison of three chromatographic retention time prediction modelsA comparison of three chromatographic retention time prediction models
A comparison of three chromatographic retention time prediction models
 
Translating data to model ICCS2022_pub.pdf
Translating data to model ICCS2022_pub.pdfTranslating data to model ICCS2022_pub.pdf
Translating data to model ICCS2022_pub.pdf
 
Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?
 
Atomreaktor
AtomreaktorAtomreaktor
Atomreaktor
 
Automation of building reliable models
Automation of building reliable modelsAutomation of building reliable models
Automation of building reliable models
 
In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...
 
A tale of experiments on bug prediction
A tale of experiments on bug predictionA tale of experiments on bug prediction
A tale of experiments on bug prediction
 
Lung Cancer Prediction using Image Classification
Lung Cancer Prediction using Image ClassificationLung Cancer Prediction using Image Classification
Lung Cancer Prediction using Image Classification
 
Mass spectrometry assay optimization using functional programming patterns in...
Mass spectrometry assay optimization using functional programming patterns in...Mass spectrometry assay optimization using functional programming patterns in...
Mass spectrometry assay optimization using functional programming patterns in...
 

Último

Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
Cherry
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
Cherry
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
Cherry
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
Cherry
 
PODOCARPUS...........................pptx
PODOCARPUS...........................pptxPODOCARPUS...........................pptx
PODOCARPUS...........................pptx
Cherry
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 

Último (20)

Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Concept of gene and Complementation test.pdf
Concept of gene and Complementation test.pdfConcept of gene and Complementation test.pdf
Concept of gene and Complementation test.pdf
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptx
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
PODOCARPUS...........................pptx
PODOCARPUS...........................pptxPODOCARPUS...........................pptx
PODOCARPUS...........................pptx
 
Plasmid: types, structure and functions.
Plasmid: types, structure and functions.Plasmid: types, structure and functions.
Plasmid: types, structure and functions.
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 

MS (and NMR) data standards in Metabolomics why, how and some caveats

  • 1. MS (and NMR) data standards in Metabolomics why, how and some caveats Steffen Neumann Leibniz Institute of Plant Biochemistry ScienceCampus Halle (WCH) June 23, 2014 S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014
  • 2. Metabolomics – The Pipeline S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014
  • 3. IPB machine Park Data processing from LC-QqTOF-MS: QStar Pulsar i, microTOF Q Bruker Apex (FTICR) HCT Ultra (IT-MS, CID+ETD) Reflex III (Maldi-TOF) Thermo Finnigan Quantum Ultra AM, LCQ Deca XP S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014
  • 4. netCDF: Grandfather is still alive netCDF as file format, ANDI-MS as content specification fine for GC/MS and simple LC/MS widely supported in software and programming languages no mix of MS and MS/MS very poor metadata Defined in Standard: “ASTM E1947 – 98(2009) Standard Specification for Analytical Data Inter- change Protocol for Chromatographic Data” available for only $42 S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014
  • 5. netCDF in action: I § ¤ 1 dataset_completeness = "C1+C2" ; 2 ms_template_revision = "1.0.1" ; 3 dataset_origin = "PE−SCIEX" ; 4 experiment_date_time_stamp = "20050928190327+0100" ; 5 operator_name = "SYSTEM" ; 6 source_file_reference = "d:tt4_4_1.wiff " ; 7 source_file_format = "PE−SCIEX Wiff version 1" ; 8 experiment_type = "Continuum Mass Spectrum" ; 9 test_separation_type = "Normal Phase Liquid Chromatography" ; 0 test_ms_inlet = "Electrospray Inlet " ; 1 test_ms_inlet_temperature = 20.f ; 2 test_ionization_mode = "Electrospray Ionization" ; 3 test_ionization_polarity = "Positive Polarity " ; 4 test_detector_type = "Electron Multiplier " ; 5 test_resolution_type = "Constant Resolution" ; 6 test_scan_function = "Mass Scan" ; 7 test_scan_direction = "Up" ; 8 test_scan_law = "Linear" ; 9 actual_run_time_length = 3480.54 ; ¦ ¥ S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014
  • 6. netCDF in action: II § ¤ 1 2 scan_acquisition_time = 2.00100016593933, 4.00200033187866, 3 6.00100040435791, 8.00100040435791, 10.0040006637573, 12.0020008087158, 4 14.0020008087158, 16.0020008087158, 18.0040016174316, 20.0020008087158, 5 6 total_intensity = 10541, 10640, 10697, 10455, 10707, 10554, 10612, 10434, 7 10738, 10504, 10567, 10646, 10675, 10660, 10676, 10638, 10498, 10581, 8 10655, 10843, 10650, 10703, 10792, 10667, 10564, 10732, 10613, 10766, 9 0 mass_values = 106.0288, 106.038, 106.0564, 106.061, 106.0656, 106.0702, 1 106.0748, 106.0794, 106.0931, 106.9725, 106.9771, 106.9817, 106.9863, 2 106.9909, 106.9955, 107.0001, 107.0047, 107.0094, 107.014, 107.0324, ¦ ¥ S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014
  • 7. More metadata in XML: mzData Proteome Standards Initiative Raw / Measurement Data: Mass Spec Equipment Software (Raw) Peaks Isolation windows, collision energies, . . . Vendor Support: Bruker, Applied Biosystems, Kratos Analytical, Matrix Science, . . . “Competitor”: mzXML S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014
  • 8. mzData + mzXML = mzML mzData 1.05 mzXML 3.0 mzML 0.90 SFO 2006-05 dataXML 0.6 DC 2006-09 Lyon 2007-04 EBI 2007-06 mzML 0.91 PSI Doc Proc 2007-11 mzML 0.99 RC Toledo 2008-04 mzML 1.0.0 Release! 2008-06 Early Development Final Development mzML 1.1.0RC5 Turku 2009-04 mzML 1.1.0 Release! 2009-06 HUPO-PSI More stable than mzXML Better defined than mzData Reference implementations Early vendor involvement mzML run spectrum spectrumDescription binaryDataArray binaryDataArray • • • precursorList scan spectrumList • • • spectrum spectrum cvList referenceableParamGroupList sampleList acquisitionSettingsList dataProcessingList softwareList instrumentConfigurationList chromatogramList • • • chromatogram chromatogram chromatogram binaryDataArray binaryDataArray S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014 Martens, Chambers, Sturm, Kessner, Levander, Shofstahl, Tang, Römpp, Neumann, Pizarro, Montecchi- Palazzi, Tasman, Coleman, Reisinger, Souda, Hermjakob, Binz, Deutsch. mzML–a community standard for mass spectrometry data. Mol Cell Proteomics. (2011)
  • 9. mzML in action: I § ¤ 1 <mzML > 2 <cv id="MS" fullName="PSI MS Vocabularies" /> 3 <cv id="UO" fullName="unit" /> 4 5 <fileContent> 6 <cvParam cv="MS" name="MS1 spectrum"/> 7 <cvParam cv="MS" name="MSn spectrum"/> 8 <cvParam cv="MS" name="centroid spectrum"/> 9 </fileContent> 0 1 <sourceFile id="sourceFile" location="C:/MSMSpos15_MM48_1_2−18485.d/analysis.baf"> 2 <cvParam cv="MS" name="Bruker BAF file"/> 3 <cvParam cv="MS" name="SHA−1" value="4ef...7c0"/> 4 </sourceFile> ¦ ¥ S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014
  • 10. mzML in action: II § ¤ 1 <software id="exportSoftware" version="3.0.5"> 2 <cvParam cv="MS" name="CompassXport"/> 3 </software> 4 <software id="recalibrationSoftware" version="4.0.234.0"> 5 </software> 6 7 <instrumentConfiguration id="instrument"> 8 <cvParam cv="MS" name="micrOTOF−Q"/> 9 </instrumentConfiguration> 0 1 <dataProcessing id="export"> 2 <processingMethod order="1" softwareRef="instrumentSoftware"> 3 <cvParam cv="MS" accession="MS:1000035" name="peak picking"/> 4 </processingMethod> 5 <processingMethod order="2" softwareRef="recalibrationSoftware"> 6 <cvParam cv="MS" name="m/z calibration"/> 7 </processingMethod> 8 <processingMethod order="3" softwareRef="exportSoftware"> 9 <cvParam cv="MS" name="Conversion to mzML"/> 0 </processingMethod> 1 </dataProcessing> ¦ ¥ S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014
  • 11. mzML in action: your data § ¤ 1 <spectrum id="scan=16" > 2 <cvParam cv="MS" name="positive scan"/> 3 <cvParam cv="MS" name="MS2 spectrum"/> 4 <cvParam cv="MS" name="centroid spectrum"/> 5 <precursor> 6 <cvParam cv="MS" name="selected ion m/z" value="542.1" unitName="m/z"/> 7 <activation> 8 <cvParam cv="MS" name="collision energy" value="15.0" unitName="electronvolt"/> 9 <cvParam cv="MS" name="low−energy collision−induced dissociation"/> 0 </ activation > 1 </precursor> 2 <binaryData> 3 <cvParam cv="MS" name="zip compression"/> 4 <cvParam cv="MS" name="m/z array" unitName="m/z"/> 5 <binary>eNrj/luT+KC02sEswyJj5...doaB42HsdAItdCw4=</binary> 6 </binaryDat> 7 <binaryData> 8 <cvParam cv="MS" name="zip compression"/> 9 <cvParam cv="MS" name="intensity array" unitName="counts"/> 0 <binary>eNpjYACCBkcHBjCwhdKWD...gAXvgH4</binary> 1 </binaryDataArray> 2 </spectrum> ¦ ¥ S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014
  • 12. www.openms.de Originally for MS-based Proteomics Reads mzData, mzXML, mzML NetCDF (Not on 64bit!) FileInfo, FileConverter, FileFilter, ... plus Calibration, Merge, NoiseFilter, . . . TOPPView Viewer and GUI ⇒ Very useful for preprocessing S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014 M. Sturm, A. Bertsch, C. Gröpl, A. Hildebrandt, R. Hussong, E. Lange, N. Pfeifer, O. Schulz-Trieglaff, A. Zerck, K. Reinert, O. Kohlbacher, 2008. OpenMS – an Open-Source Software Framework for Mass Spectrometry BMC Bioinformatics doi:10.1186/1471-2105-9-163.
  • 13. http://proteowizard.sourceforge.net/ Originally for MS-based Proteomics cross-platform (MSVC on Windows, gcc on Linux, XCode on OSX) open source (Apache v2) Formats supported on all platforms: mzML, mzXML, MGF Formats supported on Windows with vendor libraries installed: Thermo RAW, Waters RAW, Bruker FID/YEP/BAF msconvert: conversion tool. msdiff: validation of conversion/preprocessing msaccess: command line access:binary data and metadata, EICs & pseudo-2D gel image creation SeeMS: interactive viewer for mass spec data files (Windows only) S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014 Chambers, Maclean, Burke, Amodei, Ruderman, Neumann, Gatto, Fischer, Pratt, Egertson, Hoff, Kessner, Tasman, Shulman, Frewen, Baker, Brusniak, Paulse, Creasy, Flashner, Kani, Moulding, Seymour, Nuwaysir, Lefebvre, Kuhlmann, Roark, Rainer, Gerd, Hemenway, Huhmer, Langridge, Eckels, Connolly, Stearns, Deutsch, Katz, Agus, MacCoss, Tabb, Mallick. A cross-platform toolkit for mass spectrometry and proteomics.
  • 14. Converters: Notes https://xcmsonline.scripps.edu/docs/fileformats.html Bruker: Calibration requires setting a specific Registry Key: HKEY_CURRENT_USERSoftwareBruker DaltonikCompassXport UseRecalibratedSpectra=1 Waters: No support for calibration in Waters DLL used by msconvert DataBridge writes netCDF only, and writes calibrated data Ancient massWolf requires full MassLynx installed, will use calibrated data, but intermingle LockMass Scans S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014
  • 15. Plumbing: libraries for mzML pymzML (Python) http://pymzml.github.io/ jmzML (Java) https://code.google.com/p/jmzml/ OpenMS (C++) https://www.openms.de/ Proteowizard (C++) http://proteowizard.sourceforge.net/ mzR (R/Bioconductor) http://www.bioconductor.org/packages/ release/bioc/html/mzR.html . . . and many more! S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014
  • 16. MS and Metabolomics in BioC Collection of biology-related R packages Started back in 2002 Current release: >500 packages! Package Maintainer Title mzR Gatto,me,Fischer parser for netCDF, mzXML, mzData and mzML xcms Ralf Tautenhahn LC/MS and GC/MS Data Analysis MassSpecWavelet Pan Du Mass spectrum processing by wavelet-based algorithms CAMERA Carsten Kuhl Collection of Annotation related MEthods for mass spectRometry dAta Rdisop Steffen Neumann Decomposition of Isotopic Patterns MSnbase Laurent Gatto Base Functions and Classes for MS-based Proteomics iontree Mingshu Cao Data management and analysis of ion trees from ion-trap MS rpubchem Rajarshi Guha Interface to the PubChem Collection KEGGSOAP R. Gentleman client interface to the KEGG SOAP server apComplex D. Scholtens Estimate protein complex membership using AP-MS protein data PROcess X. Li Ciphergen SELDI-TOF Processing simulatorAPMS Tony Chiang Computationally simulates the AP-MS technology. TargetSearch Cuadros-Inostroza et al. analysis of GC-MS metabolite profiling data. flagme Mark Robinson Analysis of Metabolomics GC/MS Data S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014
  • 17. LC-MS Data preprocessing with XCMS www.bioconductor.org Import: netCDF, mzXML, mzData, mzML Peak detection Peak alignment Peak integration “Differential” metabolites Compatible with all MS instruments at the IPB S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014 Lange, Tautenhahn, Neumann, Gröpl. Critical assessment of alignment procedures for LC-MS proteomics and metabolomics measurements. BMC Bioinformatics (2008)
  • 18. FTICR Peak Picking Bioconductor Package “MassSpecWavelet” Integration into XCMS: Same Annotation and Identification Same statistics (Same database schema) 380 381 382 383 384 0e+002e+064e+06 a) MS raw spectrum m/z value Intensity b) CWT coefficients m/z value CWTcoefficientscale 380 381 382 383 384 158111723 380 381 382 383 384 0e+002e+064e+06 c) Identified peaks with SNR > 3 m/z value Intensity S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014 Projektarbeit Sebastian Wolf & Michael Gerlich: Du, Kibbe, Lin: Peak Detection of Mass Spectrometry Spec- trum by Continuous Wavelet Transform based Pattern Matching, Bioinformatics (2008)
  • 19. Plumbing: mzR for MS raw data New in BioC 2.10 (Oct 2011) Joint work Fischer/Gatto/Neumann Conglomerate of former XCMS code, ISB Ramp, Proteowizard via Rcpp Read netCDF, mzXML, mzData, mzML (mz5 soon ?) Read mzIdentML mzQuantML one day ? To become the affyIO of MS data ?! GSoC project 2014 to improve mzR mzR mzRramp mzRpwiz mzRnetCDF S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014 Chambers, Maclean, Burke, Amodei, Ruderman, Neumann, Gatto, Fischer, Pratt, Egertson, Hoff, Kessner, Tasman, Shulman, Frewen, Baker, Brusniak, Paulse, Creasy, Flashner, Kani, Moulding, Seymour, Nuwaysir, Lefebvre, Kuhlmann, Roark, Rainer, Gerd, Hemenway, Huhmer, Langridge, Eckels, Connolly, Stearns, Deutsch, Katz, Agus, MacCoss, Tabb, Mallick. A cross-platform toolkit for mass spectrometry and proteomics.
  • 20. imzML: imaging mass spectrometry in mzML Huge data files, complex access patterns imzML: same ’ol mzML, but base64 in 2nd data file Some new CV terms faster access 7/8 space reduction lossless mzML imzML http://www.imzml.org ⇒ Open MS imaging software! S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014 Schramm T, Hester A, Klinkert I, Both J-P, Heeren RMA, Brunelle A, Laprévote O, Desbenoit N, Robbe M- F, Stoeckli M, Spengler B, Römpp A (2012) imzML — A common data format for the flexible exchange and processing of mass spectrometry imaging data. J. of Proteomics 10.1016/j.jprot.2012.07.026
  • 21. mz5: netCDF meets mzML Convert from XML to HDF5 HDF5: big cousin of netCDF Pros: size reduction 54% read/write speed 3–4-fold Fully implemented in pwiz HDF5 API for most languages Cons: Not human-readable Kills emacs and wordpad S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014 mz5: Space- and Time-efficient Storage of Mass Spectrometry Data Sets M. Wilhelm, M. Kirchner, J. Steen, H Steen, MCP 10.1074/mcp.O111.011379
  • 22. Focus of standards in NMR D2.6Metadata ISAtab D2.4Raw data nmrML Metabolite Identification mzTab Metabolite Quantification mzTab
  • 23. Capture NMR raw data (equivalent to mzML) Ingredients for nmrML standard: ● XML Schema and controlled vocabulary (CV) ● Examples, converters and validation suite ● COSMOS partners involved: IPB, EMBL-EBI, UB2, UBHAM, UOXF, IMPERIAL, MRC, Mike Wilson (Canada), Matthias Klein (D), Ian Lewis (US) New format: nmrML D2.4Raw data nmrML
  • 24. github.org as development platform ● Web site with content management http://nmrml.org/ ● Version control system, Issue tracker, activity statistics ● Free for open source projects nmrML infrastructure D2.4Raw data nmrML
  • 25. ● Controlled vocabulary developed as OWL ontology ● Based on earlier work by MSI, D. Rubtsov and J.Cruz ● ISAtab can leverage ontologies ● With semantic web / RDF / SparQL in mind for later deliverables nmrML Ontology D2.4Raw data nmrML
  • 26. The need for an open nmr standard nmrML: an XML-based open standard for NMR data storage and exchange NMR data is currently accumulating in local data silos, hindering distribution and secondary data usage. Cross platform NMR data access, integration and comparison is hindered by incompatible vendor formats and the lack of a robust vendor-agnostic NMR data standard. Data in proprietary data formats ages fast, posing the danger of irreproducible data from older studies. An open vendor-neutral storage standard is needed as long-term archival format, if emerging metabolomics repositories are to capture data from all vendor formats in a persistent way, yet supporting the dynamics in this field. To ease format conversions we deliver parsers for Bruker and Varian data formats, which can be incorporated into open NMR processing and analysis software. Parsers Although coverage is good at raw data capture, the XSD and CV will be expanded for better processed data and quantification data. Our standard is accepted by major open source nmr data processing tools and will serve the MetaboLights repository with a stable storage format. Daniel Schober 1, Michael Wilson2, Daniel Jacob3, Annick Moing3, Catherine Deborde3, Luis de Figueiredo4, Kenneth Haug4, Philippe Rocca-Serra5, John Easton6, Christian Ludwig7, Antonio Rosato8, David Wishart2, Christoph Steinbeck4, Reza Salek4, Steffen Neumann1 1Leibniz Institute of Plant Biochemistry, Dept. of Stress and Developmental Biology, Weinberg 3, 06120 Halle, Germany 2Department of Computing/Biological Sciences, University of Alberta, Edmonton, Canada 3INRA, Univ. Bordeaux, Metabolome Facility of Bordeaux Functional Genomics Center, 71 av Edouard Bourlaux, F-33140 Villenave d’Ornon, France 4European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK 5University of Oxford, e-Research Centre, 7 Keble Road, Oxford, OX1 3QG, UK 6School of Electronic, Electrical and Computer Engineering, University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK 7School of Cancer Sciences, University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK 8Magnetic Resonance Center (CERM), University of Florence, 50019 Sesto Fiorentino (FI), Italy nmrML XML schema excerpt nmrML data example nmrML use cases The COordination of Standards in MetabOlomicS, COSMOS EU consortium has teamed up with the metabolomics standards initiative to create an open exchange and storage format for NMR data. We largely follow design principles already established in the Proteomics Standards Initiative (PSI) for the mzML data standard for mass spectrometry. The standard is composed of an XML schema (nmrML.xsd) and an accompanying controlled vocabulary (nmrCV.owl), which ensures update flexibility and schema robustness by allowing to outsource more variant and dynamic descriptors into the vocabulary which is referenced from within an nmrML file. •Website: http://www.nmrML.org •Github: https://github.com/nmrML/nmrML •nmrML validator: http://msbi.ipb-halle.de/nmrML/index.php •Cosmos: http://www.cosmos-fp7.eu/ •Email: info@nmrml.org •Google Group: https://groups.google.com/forum/?hl=en#!forum/nmrml/join Data from a paper: Farag, M., Porzel, A., Schmidt, J. & Wessjohann, L. Metabolite profiling and fingerprinting of commercial cultivars of Humulus lupulus L. (hop) - a comparision of MS and NMR methods in metabolomics, Metabolomics 8, 492-507, (2012) <nmrML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://nmrml.org/schema ../../../xml-schemata/nmrML.xsd" xmlns="http://nmrml.org/schema" version="1.0.0"> <cvList count="2"> <cv fullName="nmrML Controlled Vocabulary" version="0.0.1" id="NMRCV" URI="http://www.nmrml.org/nmrml-cv.0.0.1.owl"/> <cv fullName="Unit Ontology" version="3.2.0" id="UO" URI="http://unit- ontology.googlecode.com/svn/trunk/uo.owl/"/> </cvList> <contactList> <contact id="ID004" fullname="Lutger A. Wessjohann" email="Ludger.Wessjohann [a] ipb- halle.de"/> <contact id="ID044" fullname="Mohamed A. Farag" email="mfarag73 [a] yahoo.com"/> </contactList> <sourceFileList count="2"> <sourceFile sha1="fd99c095046e2356c7d31154d45353fa79cbc844" location=file:///Users/mike/Projects/nmrML/nmrML/examples/IPB_HopExample/FIDs/FAM013_ AHTM.PROTON_04.fid/procpar id="SOURCE_FILE_0" name="procpar"> <cvTerm cvRef="NMRCV" accession="NMR:1400297" name="Varian VNMR Format"/> <cvTerm cvRef="NMRCV" accession="NMR:1002006" name="acquisition parameter file"/> </sourceFile> <sourceFile sha1="e4ffeb41da28b1e9017e72819252ec6d78f8179f“ location= file:///Users/mike/Projects/nmrML/nmrML/examples/IPB_HopExample/FIDs/FAM013_AHTM.PROTON_04.fid/fi d id="SOURCE_FILE_1" name="fid"> <cvTerm cvRef="NMRCV" accession="NMR:1400297" name="Varian VNMR Format"/> <cvTerm cvRef="NMRCV" accession="NMR:1400119" name="FID file"/> </sourceFile> </sourceFileList> <softwareList count="1"> <software cvRef="NMRCV" accession="NMR:1000277" name="VnmrJ software" version="2.2C" id="SOFTWARE_1"/> </so<instrumentConfigurationList count="4"> <instrumentConfiguration id="INST_CONFIG_1"> <cvTerm cvRef="NMRCV" accession="NMR:1400234" name="Varian NMR instrument"/> <cvTerm cvRef="NMRCV" accession="NMR:1000235" name="Varian probe"/> <cvTerm cvRef="NMRCV" accession="NMR:1400234" name="Varian NMR instrument"/> <cvTerm cvRef="NMRCV" accession="NMR:1000236" name="5mm HCN probe"/> </instrumentConfiguration> </instrumentConfigurationList> <acquisition> <acquisition1D> <acquisitionParameterSet numberOfScans="160" numberOfSteadyStateScans="0"> <sampleAcquisitionTemperature unitName="kelvin" unitCvRef="UO" value="299.15" unitAccession="UO:0000012"/> <spinningRate unitName="hertz" unitCvRef="UO" value="0" unitAccession="UO:0000106"/> <relaxationDelay unitName="second" unitCvRef="UO" value="22.2737024" unitAccession="UO:0000010"/> <pulseSequence/> <DirectDimensionParameterSet numberOfDataPoints="65536" decoupled="false"> <acquisitionNucleus cvRef="NMRCV" accession="NMR:1400151" name="1H"/> <gammaB1PulseFieldStrength unitName="hertz" unitCvRef="UO" value="34482.7586207" unitAccession="UO:0000106"/> <irradiationFrequency unitName="hertz" unitCvRef="UO" value="599.8311617" unitAccession="UO:0000106"/> </DirectDimensionParameterSet> </acquisitionParameterSet> <fidData byteFormat="Complex128" encodedLength="324160" compressed="true">eJwMl4dfzl8Ux7U3lYZKy0qiomQ […]</fidData> </acquisition1D> </acquisition> </nmrML> ftwareList> MetaboLights The nmrML setup We also deliver a content validator which checks a data file is syntactically well formatted, sufficiently complete and that aspects of minimal information requirements like the Core Information for Metabolomics Reporting (CIMR) are met. Validators Outlook Project resources nmrML setup •MetaboLights: http://www.ebi.ac.uk/metabolights/ •MSI: http://msi-workgroups.sourceforge.net/ •CIMR-MI: http://mibbi.sourceforge.net/projects/CIMR.shtml Validation Layer Onion Validation webservice & resultValidation rules (html)
  • 27. My pleas for the future . . . to the vendors: Please start (or continue!) to support Open Data formats . . . to the computational mass spec community: Please use (and improve!) joint data I/O libraries . . . to YOU (the users): Please start (or continue!) to REQUEST open formats when inviting to bid for a new instrument S. Neumann (IPB-Halle.DE) (Raw) data standards in metabolomics June 23, 2014