ERVA-NMR

ERVA: a novel method of binning,
allowing chemical information to be highlighted,
from 1H-NMR metabolomics data
(1) PMFB –UMR 1332, INRA, F-33140 Villenave d’Ornon
Daniel Jacob (1), Catherine Deborde (1), Annick Moing (1)

Metabolic fingerprinting
Aims: Classification of samples & highlighting the metabolic biomarkers
NMR Spectra
Spectra processing
Experiment
Features
Samples
Statistical
Analyses
Data matrix
D.Jacob – 7 RFMF - Amiens, 10 juin 2013

NMR Spectra
Spectra processing
Aims: Classification of samples & highlighting the metabolic biomarkers
Experiment
RAW DATA
Features
Samples
Data matrix
Relevant
Information
Statistical
Analyses

Spectra processing

Data Reduction : Bucketing
Comparison of resulting buckets produced by Equidistant and AIBIN(1) binning methods
(1) AIBIN: Adaptive, Intelligent Binning Algorithm, de Meyer T et al. (2008) Anal. Chem 80:3783–3790
• Take into account full data, including noise area
• Generates asymmetric buckets which are not centered on the peaks.
Drawbacks of the AIBIN binning method:

Data Reduction : Bucketing
New approach called ERVA for Extraction of Relevant Variables for Analysis:
• Convolution product between a spectrum (S) and the second order derivative
of the Lorentzian function (SDL)
Jacob D. et al (March 2013) Analytical and Bioanalytical Chemistry, 405, 5049-5061
• The convolution product gives a signal (in blue).
• The zero crossings of the resulting signal extended each side by the value of σ (the full width at
half maximum of Lorentzian function) give the bounds of the buckets

ERVA : Extraction of Relevant Variables for Analysis
• a NMR spectrum is a sum of Lorentzian, plus noise and distortion,
• the second derivative of a Lorentzian is symmetric, and its integral is zero.
Why SDL ?
Mathematically, applying such a convolution product on a spectrum is similar to
partial wavelet decomposition
In case of a full experimental design, the convolution product is applied on the
average spectrum obtained by summation of all spectra.

A1 A2 A3
E1 E2 E3E2
Comparison of resulting buckets produced by ERVA and AIBIN(1) binning methods
- Sum of three identical Lorentzians but shifted between them with a ppm interval
- A1,A2,A3: The bins produce by the AIBIN method delimited by the dotted lines
- E1,E2,E3: , The bins produce by ERVA method are shown by superposed grey boxes.
(1) AIBIN: Adaptive, Intelligent Binning Algorithm, de Meyer T et al. (2008) Anal. Chem 80:3783–3790
1/ Integration of ERVA's buckets provides values closer together than those obtained by
AIBIN method.
2/ Centres of buckets correspond to the centres of resonance peaks with the ERVA method
unlike AIBIN method.

Illustration of the effect of the alignment process.
A1
A3 A2
Example of the “citrate-malate” zone from a NMR spectra set of Tomato
-When a spectral peak alignment is required in the misaligned region involving alteration of the lower
part of the peaks, impacts will remain relatively minor using the ERVA data reduction method.
- Indeed, buckets produced by the ERVA method are mainly based on the central part of peaks.
As shown below, the A1 region was first aligned and the A2 and A3 regions were then aligned in turn

Clustering of buckets
Buckets now have a strong chemical meaning
Thanks to their exact matching with the resonance peaks, since the resonance
peaks are the fingerprints of chemical compounds
• Compounds involved in the same biochemical pathway may present high
correlations between their resonances,
• But not usually as high as for resonances corresponding to the same molecule
Realistic Assumption
To generate relevant clusters (i.e. chemical compounds), an
appropriate correlation threshold has to be applied on the
correlation matrix before its cluster decomposition
Appliance of a similar approach of clustering of latent variables(*) (CLV),
which involves two steps:
• a hierarchical clustering analysis based on correlations between buckets,
• a partitioning algorithm (R IGRAPH package).
(*) Vigneau E et al. (2005) Clustering of variables to analyze spectral data. J Chemom 19:122-128

Effect of the correlation threshold on the size and number of buckets clusters
The correlation threshold allowing a maximal discrimination of compounds (3) is one that gives the maximum
number of clusters in the optimum range (grey area) defined by :
(1) the higher limit of the size of the biggest cluster (40),
(2)the higher value to the ratio of the criterion.
Criterion =
Total number of clusters
Size of the biggest cluster
PhenoTom. – UR 1052 Unité Génétique et Amélioration des Fruits et Légumes - INRA - Montfavet (France)
Characterization of tomato fruits in two stages (expansion and red orange fruit) from 12 contrasting genotypes (lines
8 and 4 F1 hybrids derived).

Buckets’ Clustering greatly helps the interpretation of
discriminant analyses such as PCA, PLS, ...
PhenoTom. – UR 1052 Unité Génétique et Amélioration des Fruits et Légumes - INRA - Montfavet (France)
Characterization of tomato fruits in two stages (expansion and red orange fruit) from 12 contrasting genotypes (lines
8 and 4 F1 hybrids derived).

Buckets’ Clustering greatly helps the interpretation of
discriminant analyses such as PCA, PLS, ...
Correlation threshold = 0.98 623 Buckets
•Nb Clusters = 58 => 254 buckets
•Biggest Cluster => 18 buckets
Clusters mainly located at the periphery of a circle => biomarkers are highlighted

1 2 2
1
1
R1
2
R2
2
>
R1
2
Highlighting biomarkers
1 2 2
R2
2
• By chosing a good correlation threshold, clusters link mainly the
buckets that have a "between-groups" variance,
• Hoping that these "groups" corresponds to factor levels.

Matching the bucket clusters with compounds
Reference compound library:
HMDB, MMCD, BMRB, …
or a home-made library
Scoring fonction is based on the concept of "valid
cluster" introduced in Chenomx NMR suite 6.0
Clusters

d1 d2 d3
d4
d2
d4
Bucketing+Clustering+Matching: Focus on a small example
Mounet et al (2006) Metabolomics, 2007, 3:273-288
d1 d2 d3
d1
d3
CLUSTER
PPM: 3.235, 3.252, 3.269, 3.387, 3.398, 3.406, 3.417, 3.425, 3.436, 3.456, 3.461, 3.468, 3.472, 3.481, 3.487, 3.491, 3.499, 3.735,
3.745, 4.646, 4.662, 5.238, 5.245
# DBREF0014 (Glucose): Score=0.878068 : CLUSTER: 23/23 matches
Matching ppm: 3.235, 3.252, 3.269, 3.387, 3.398, 3.406, 3.417, 3.425, 3.436, 3.456, 3.461, 3.468, 3.472, 3.481, 3.487, 3.491,
3.499, 3.735, 3.745, 4.646, 4.662, 5.238, 5.245

Tomato
Mounet et al (2006)
Quantitative metabolic profiles of
tomato flesh and seeds during fruit
development: complementary analysis
with ANN and PCA. Metabolomics,
2007, 3:273-288
Global approach to characterize
changes in metabolic profiles in two
interdependent tissues Seed and Flesh
from the same tomato fruits during
tomato fruit development.
•25 true positive compounds
(more than 80 % of the 31
compounds identified by the
expert user),
•Including 21 compounds at
rank 1 (nearly 70 %)

To summarize

Conclusions - Perspectives
• The « Bucketing » and « Clustering » steps are very efficient to
• Extract relevant information from raw data,
• Allow the metabolic biomarkers to be highlighted,from 1H-NMR
metabolomics data
• The « Matching clusters » step is very efficient provided that
• The relevant reference NMR spectra libray are available
To address this need, MetaboHub aims to provide a bioinformatics framework to
provide a centralized databases for managing metabolites spectral libraries, i.e. the
most commonly observed in an experiment of metabolomics, and this,
i) in the various domains (nutrition, medicine, environment, plant),
ii) in several analytical techniques.

Remerciements :
UMR1332 BFP / PMFB
Stéphane Bernillon
Catherine Deborde
Yves Gibon
Mickaël Maucourt
Annick Moing
Dominique Rolin http://bit.ly/merybDominique Rolin http://bit.ly/meryb
http://bit.ly/biostatflow
https://code.google.com/p/nmr-viewer/

Correlation threshold = 0.969
Effect of the correlation threshold
on the number of buckets clusters
(PCA loadings)

ERVA-NMR

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a ERVA-NMR

Similar a ERVA-NMR (20)

Más de Daniel JACOB

Más de Daniel JACOB (6)

Último

Último (20)

ERVA-NMR