Spectra processing is crucial in metabolomics approaches, especially for proton NMR metabolomic profiling, since each processing step may impact the following steps. Among the different processing steps, data reduction (binning or bucketing) strongly impacts subsequent statistical data analysis and potential biomarker discovery. Based on a recently published work, we propose an improved method of data reduction, called ERVA which stands for Extraction of Relevant Variables for Analysis. This new method, by providing buckets centred on resonance peaks and rid of any non-significant signal, helps to recover the chemical fingerprints of metabolites. Moreover, we take advantage of the concentration variability of each compound from a series of samples of a complex mixture, to highlight chemical information. This is performed by linking the buckets into clusters based on significant correlations, thus bringing a helpful support for compound identification. As a proof of concept, this new method has been applied to a tomato 1H-NMR dataset to test its ability to recover fruit extract composition.
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
ERVA-NMR
1. ERVA: a novel method of binning,
allowing chemical information to be highlighted,
from 1H-NMR metabolomics data
(1) PMFB –UMR 1332, INRA, F-33140 Villenave d’Ornon
Daniel Jacob (1), Catherine Deborde (1), Annick Moing (1)
2. Metabolic fingerprinting
Aims: Classification of samples & highlighting the metabolic biomarkers
NMR Spectra
Spectra processing
Experiment
Features
Samples
Statistical
Analyses
Data matrix
D.Jacob – 7 RFMF - Amiens, 10 juin 2013
3. NMR Spectra
Spectra processing
Metabolic fingerprinting
Aims: Classification of samples & highlighting the metabolic biomarkers
Experiment
RAW DATA
Features
Samples
Data matrix
Relevant
Information
Statistical
Analyses
D.Jacob – 7 RFMF - Amiens, 10 juin 2013
5. Data Reduction : Bucketing
Comparison of resulting buckets produced by Equidistant and AIBIN(1) binning methods
(1) AIBIN: Adaptive, Intelligent Binning Algorithm, de Meyer T et al. (2008) Anal. Chem 80:3783–3790
• Take into account full data, including noise area
• Generates asymmetric buckets which are not centered on the peaks.
Drawbacks of the AIBIN binning method:
D.Jacob – 7 RFMF - Amiens, 10 juin 2013
6. Data Reduction : Bucketing
New approach called ERVA for Extraction of Relevant Variables for Analysis:
• Convolution product between a spectrum (S) and the second order derivative
of the Lorentzian function (SDL)
Jacob D. et al (March 2013) Analytical and Bioanalytical Chemistry, 405, 5049-5061
• The convolution product gives a signal (in blue).
• The zero crossings of the resulting signal extended each side by the value of σ (the full width at
half maximum of Lorentzian function) give the bounds of the buckets
D.Jacob – 7 RFMF - Amiens, 10 juin 2013
7. ERVA : Extraction of Relevant Variables for Analysis
• a NMR spectrum is a sum of Lorentzian, plus noise and distortion,
• the second derivative of a Lorentzian is symmetric, and its integral is zero.
Why SDL ?
Mathematically, applying such a convolution product on a spectrum is similar to
partial wavelet decomposition
In case of a full experimental design, the convolution product is applied on the
average spectrum obtained by summation of all spectra.
D.Jacob – 7 RFMF - Amiens, 10 juin 2013
8. A1 A2 A3
E1 E2 E3E2
Comparison of resulting buckets produced by ERVA and AIBIN(1) binning methods
- Sum of three identical Lorentzians but shifted between them with a ppm interval
- A1,A2,A3: The bins produce by the AIBIN method delimited by the dotted lines
- E1,E2,E3: , The bins produce by ERVA method are shown by superposed grey boxes.
(1) AIBIN: Adaptive, Intelligent Binning Algorithm, de Meyer T et al. (2008) Anal. Chem 80:3783–3790
1/ Integration of ERVA's buckets provides values closer together than those obtained by
AIBIN method.
2/ Centres of buckets correspond to the centres of resonance peaks with the ERVA method
unlike AIBIN method.
D.Jacob – 7 RFMF - Amiens, 10 juin 2013
9. Illustration of the effect of the alignment process.
A1
A3 A2
Example of the “citrate-malate” zone from a NMR spectra set of Tomato
-When a spectral peak alignment is required in the misaligned region involving alteration of the lower
part of the peaks, impacts will remain relatively minor using the ERVA data reduction method.
- Indeed, buckets produced by the ERVA method are mainly based on the central part of peaks.
As shown below, the A1 region was first aligned and the A2 and A3 regions were then aligned in turn
D.Jacob – 7 RFMF - Amiens, 10 juin 2013
10. Clustering of buckets
Buckets now have a strong chemical meaning
Thanks to their exact matching with the resonance peaks, since the resonance
peaks are the fingerprints of chemical compounds
• Compounds involved in the same biochemical pathway may present high
correlations between their resonances,
• But not usually as high as for resonances corresponding to the same molecule
Realistic Assumption
To generate relevant clusters (i.e. chemical compounds), an
appropriate correlation threshold has to be applied on the
correlation matrix before its cluster decomposition
Appliance of a similar approach of clustering of latent variables(*) (CLV),
which involves two steps:
• a hierarchical clustering analysis based on correlations between buckets,
• a partitioning algorithm (R IGRAPH package).
(*) Vigneau E et al. (2005) Clustering of variables to analyze spectral data. J Chemom 19:122-128
D.Jacob – 7 RFMF - Amiens, 10 juin 2013
11. Effect of the correlation threshold on the size and number of buckets clusters
The correlation threshold allowing a maximal discrimination of compounds (3) is one that gives the maximum
number of clusters in the optimum range (grey area) defined by :
(1) the higher limit of the size of the biggest cluster (40),
(2)the higher value to the ratio of the criterion.
Criterion =
Total number of clusters
Size of the biggest cluster
PhenoTom. – UR 1052 Unité Génétique et Amélioration des Fruits et Légumes - INRA - Montfavet (France)
Characterization of tomato fruits in two stages (expansion and red orange fruit) from 12 contrasting genotypes (lines
8 and 4 F1 hybrids derived).
D.Jacob – 7 RFMF - Amiens, 10 juin 2013
12. Buckets’ Clustering greatly helps the interpretation of
discriminant analyses such as PCA, PLS, ...
PhenoTom. – UR 1052 Unité Génétique et Amélioration des Fruits et Légumes - INRA - Montfavet (France)
Characterization of tomato fruits in two stages (expansion and red orange fruit) from 12 contrasting genotypes (lines
8 and 4 F1 hybrids derived).
D.Jacob – 7 RFMF - Amiens, 10 juin 2013
13. Buckets’ Clustering greatly helps the interpretation of
discriminant analyses such as PCA, PLS, ...
Correlation threshold = 0.98 623 Buckets
•Nb Clusters = 58 => 254 buckets
•Biggest Cluster => 18 buckets
Clusters mainly located at the periphery of a circle => biomarkers are highlighted
D.Jacob – 7 RFMF - Amiens, 10 juin 2013
14. 1 2 2
1
1
R1
2
R2
2
>
R1
2
Highlighting biomarkers
1 2 2
R2
2
• By chosing a good correlation threshold, clusters link mainly the
buckets that have a "between-groups" variance,
• Hoping that these "groups" corresponds to factor levels.
D.Jacob – 7 RFMF - Amiens, 10 juin 2013
15. Matching the bucket clusters with compounds
Reference compound library:
HMDB, MMCD, BMRB, …
or a home-made library
Scoring fonction is based on the concept of "valid
cluster" introduced in Chenomx NMR suite 6.0
Clusters
D.Jacob – 7 RFMF - Amiens, 10 juin 2013
17. Tomato
Mounet et al (2006)
Quantitative metabolic profiles of
tomato flesh and seeds during fruit
development: complementary analysis
with ANN and PCA. Metabolomics,
2007, 3:273-288
Global approach to characterize
changes in metabolic profiles in two
interdependent tissues Seed and Flesh
from the same tomato fruits during
tomato fruit development.
D.Jacob – 7 RFMF - Amiens, 10 juin 2013
•25 true positive compounds
(more than 80 % of the 31
compounds identified by the
expert user),
•Including 21 compounds at
rank 1 (nearly 70 %)
19. Conclusions - Perspectives
• The « Bucketing » and « Clustering » steps are very efficient to
• Extract relevant information from raw data,
• Allow the metabolic biomarkers to be highlighted,from 1H-NMR
metabolomics data
• The « Matching clusters » step is very efficient provided that
• The relevant reference NMR spectra libray are available
To address this need, MetaboHub aims to provide a bioinformatics framework to
provide a centralized databases for managing metabolites spectral libraries, i.e. the
most commonly observed in an experiment of metabolomics, and this,
i) in the various domains (nutrition, medicine, environment, plant),
ii) in several analytical techniques.
D.Jacob – 7 RFMF - Amiens, 10 juin 2013