3. Measure what is measurable, and
make measurable what is not so.
Galileo Galilei
1564-1642
Metabolomics deals with measuring a large number
of chemicals.
Any act and art that deals with measuring something contributes greatly to
the welfare of human society.
4. 0.0 20.0 40.0 60.0 80.0 100.0120.0140.0160.0
Pancreas cancer
Breast cancer
Kidney diseases
Diabetes mellitus
Colon and rectum cancers
Lower respiratory infections
Chronic obstructive pulmonary…
Trachea, bronchus, lung cancers
Alzheimer disease and other…
Stroke
Ischaemic heart disease
Infectious diseases and malnutrition Chronic diseases
0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0
Protein-energy malnutrition
Road injury
Birth asphyxia and birth trauma
Preterm birth complications
Tuberculosis
Malaria
Stroke
HIV/AIDS
Ischaemic heart disease
Diarrhoeal diseases
Lower respiratory infections
Injury
Top 10 causes of deaths
High-income countries Low-income countries
Crude death rate (per 100,000 population)
Why measure chemicals ?
http://www.who.int/healthinfo/global_burden_disease/en/
A disturbed chemistry plays a major role in the biology of chronic diseases.
5. Alzheimer’s is the most expensive disease in America
https://www.alz.org/facts/
6. Most exposures are chemicals
Sum of all internal and external exposures
Rappaport SM and Smith MT, Science 22 Oct 2010:
The Exposome – an emerging key concept in the public health
7. Disturbed metabolism is involved in the progression of chronic diseases.
It can be a risk factor as well as a characteristic of a disease state.
Disease &
metabolism
PubMed
Articles
Cancer 525141
Diabetes 207579
Heart diseases 541959
Brain diseases 323206
Asthma 41023
Source : wikipedia.org
Up to 10% of the human
genome regulates or
operates metabolism.
Chemicals define metabolic pathways
8. Blood supplies chemicals to every cell in the body.
Which chemicals to measure and where ?
a
b
c
the blood exposome
*in epidemiology and clinical
research
9. Chemical diversity Concentration range
Two main factors in measuring chemicals
Rappaport, Stephen M., Dinesh K. Barupal, David Wishart, Paolo Vineis, and Augustin Scalbert.
"The Blood Exposome and Its Role in Discovering Causes of Disease." Environ Health Perspect (2014).
15. Metabolomics COREs can provide cost-
effective, reliable and useful services
Attribute in metabolomics
Specific Peak annotation
Measurable Peak quality
Attainable Robust assays
Relevant Effect sizes/
hypothesis
Timely Turn-around time
Raw data quality is paramount for the success of a client project.
17. 18 Compounds monitored across the
chromatographs of the QC samples to
check the reproducibility of analyses.
LC/MS is a robust technique for large batches
19. 150 180 200
364 380 400
450
575
722
947
0
200
400
600
800
1000
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
1250
2018
Number of identified compounds
in a blood metabolomics dataset
by metabolon.
Advancements in blood metabolomics - 1
~ 900 uniquely identified compounds at WCMC
Barupal, Dinesh K., et al. "A comprehensive plasma metabolomics
dataset for a cohort of mouse knockouts within the international mouse
phenotyping consortium." Metabolites9.5 (2019): 101.
20. -0.036
-0.034
-0.032
-0.030
-0.05
0.00
0.05
0.10
PC1 - variance explained : 95 %
PC2-varianceexplained:1.2%
QC
Sample
Figure 2
0
20
40
60
5 10 15 20 25
RSD (%)
count
Raw LCMS spectra
esi (+) : 915 ( 832 samples + 83 QCs )
esi (-) : 918 ( 833 samples + 85 QCs )
Extracted Ion chromatograms
Raw Peak Heights
Normalized Peak Heights
Analytes : 914 (ESI (+/-)
Filtered dataset
(ADMC_ADNI1_LIPIDOMICS.csv)
N = 915 (832 samples + 83 QCs)
Analytes : 521 (ESI (+/-)
Agilent Qual 7.0 Curated database of compounds
(wcmc-lipidomics-database.zip)
Dataset generation
• RT correction
• Extraction of Peak Heights
• Merge files
Normalization • LOESS Normalization
• Batch effect removal
Data Filtering
• RSD in QC >25%
• Duplicate peaks
• Median Peak Heights in
QC <1000 counts
Serum specimens
LCMS Analysis
• Lipid extraction
• CSH RP separation
• QTOF Mass detection
(ADNI1_LIPIDOMICS_RAW_DATA.zip)
(adni1-lipidomics-eics.zip)
Principal component analysis
Software
Agilent UHPLC-qTOF
Liquid chromatography
Mass spectrometry
Standardized and harmonized
Barupal, Dinesh Kumar, et al. "Generation and quality control of lipidomics data for the
alzheimer’s disease neuroimaging initiative cohort." Scientific data 5 (2018): 180263.
Advancements in blood metabolomics - 2
22. Colum ESI (+) ESI (-)
CSH C18 15 15
BEH C18 15 15
HSS T3 RP 15 15
HILIC – BEH amide 15 15
PoroShell HILIC 15 15
PFP 15 15
1) Complementary LC separations are needed
Highly Polar Semi Polar Non polar
PoroShell HILIC HILIC – BEH amide
Polar
BEH C18 CSH C18
15 minutes 15 minutes 15 minutes 15 minutes
Possible solution :
ESI (+) & (-)
Untargeted
mode
Cannot analyze one sample on six columns
23. 2) Automated liquid handling can lower technical
variance
Agilent 1290 UPLC -6550 QTOF
Janus robot (Perkin Elmer)
Received samples
Thawing
Samples ready for analysis
Aliquoting and extraction
24. 3) Mass Spectral Libraries can be rapidly developed
~ 600 pure compound
in dried form
MS1 Data
Targeted Ion Search
for each file
compound-RT info
Targeted MSMS
Acquisition
.d files
Targeted MSMS
Peak search
.CEF files
Imported into PCDL
manager
database in .cdb
format
AutoMSMS data
MSMS peak
search with RT
.CEF files
• 300 Spectra in ESI
positive mode, out
of which ~160 had
RT >1.0 min
• 100 spectra in ESI
negative mode
• Data acquisition and
processing took two
weeks
MONA Database
A week to deliver a library
for 500 compounds.
25. Compound with
MSMS spectra but
unknowns
Compound without any
MSMS spectra
(1000s)
Known compound + MS/MS
Spectra but not interpreted yet
Known compound
+MSMS Spectra and
interpreted
4) MS/MS for a majority of LC/MS peaks are
needed
27. 6) New MS instruments are emerging
This will create a pressure on grants.
28. Targeted Untargeted
Measure one or more selected
metabolites
Measure as many as possible
Instrument: triple quad/QTRAP LC-MS/MS Instrument : Q-TOF/Q-Exactive LC-MS/MS
Blood sample
signals
Analyzer Data collection Blood sample
signals
Analyzer Data collection
7) comprehensive assays are needed
31. Raw Data matrix
Normalization
R-scripts
Effect size / significance
Network visualization
MetaMapp
Enrichment
PCA
visualization
Statistics
ChemRICH
QC reports
Machine
learning
Pathway
Visualization
Literature data
Pathway
Visualization
3.2e-06
0.0026
0.8
1.0
1.2
1.4
Clas /regr models
Instrumentdata
Data analysis workflowClient/samples
Software
32. Scripting Clicking or GUI
MetaboAnalyst
Local installation
Online
XCMS online
Local installation
Online
(Flexible) (Fixed)
How to process MS data ?
33. Scripting Clicking or GUI
MetaboAnalyst (R-based)
Local installation
Online
Microsoft
MetaBox (R-based)
Local installation
Online
(Flexible) (Fixed)
How to analyze data ?
34. MetaboAnalyst is a popular tool among
non-coders
http://www.metaboanalyst.ca/
Pros
• Easy to navigate
• Provides commonly used
statistical methods
35. MetDA @ WCMC
http://metda.fiehnlab.ucdavis.edu/
Study design
• Power analysis
Data processing
• Missing value
computation
• Outlier detection
• Normalization & batch
effect removal
• Transformation
• Scaling
• Descriptive statistics
Hypothesis testing
• Student t-test
• Mann-whitney test
• Wilcoxon-signed-rank test
• Kruskal walis test
• One-way ANOVA
• Two-way ANOVA
• Two-way mixed ANOVA
• Two-way repeated measured
ANOVA
• Normality test
Association modeling
• Linear regression
• Logistic regression
• Survival models
Multivariate analysis
• Principal component analysis
• Hierarchical cluster analysis
• PLS-DA
Classification prediction
• Random forest
• Support vector machine
• LightGBM
By Sili Fan
39. KS test is a better statistical method for
metabolomics enrichment
Parameter
Fisher
Exact
Hypergeo
metric Bionomial K-S
Background
database Yes Yes No No
p-value cutoff Yes Yes Yes No
K-S :Kolmogorov–Smirnov test
is a nonparametric test of the equality of continuous, one-
dimensional probability distributions that can be used to
compare a sample with a reference probability distribution
(one-sample K–S test)
Limitations of the hypergeometric test
Barupal, Dinesh Kumar, and Oliver Fiehn. "Chemical Similarity Enrichment Analysis (ChemRICH) as
alternative to biochemical pathway mapping for metabolomic datasets." Scientific reports 7.1 (2017): 14567.
41. Production of PUFA containing lipids is disturbed in AD subjects
FA (13:0)
FA (15:1)
FA (24:0)
FA (26:0)
FA (28:0)
PC (o-38:3)
PC (p-42:3)
PC (p-38:3)
PC (p-40:3)
DG (36:4)
TG (51:4)
TG (52:4)
CE (20:5)
CE (22:6)
LPC (20:5)
PC (36:5)
PE (p-36:5)
FA (20:5)
FA (22:6)
LPC (22:6)
PC (36:6)
PC (37:6)
PC (38:6)
PC (38:7)
PC (39:6)
LPE (22:6)
PC (38:6)
PC (40:6)
PC (40:7)
PI (40:6)
PE (p-38:6)
PE (p-40:6)
PE (p-40:7)
TG (56:8)
TG (56:9)
TG (58:10)
TG (58:8)
TG (58:9)
TG (60:11)
0
10
20
0.0 2.5 5.0 7.5
Double bond count
-log(p-value)• Next question - What role genetics, diet and
drugs can play in this disturbance ?
• First, we checked the effect of fish oil
supplements (DHA) on the phospholipid
production.
EPA DHA
Lower in AD Higher in AD
New bioinformatics approach :
Logistic regression and ChemRICH set enrichment
analysis.
EPA (20:5)
DHA (22:6)
PUFAs
significance
42. Meta-analysis for untargeted lipidomics data
Lipophilicity
Sex differences in
the blood lipidome
Four large cohorts (n 800-2500)
analyzed over a period of 6 years.
43. Barupal, Dinesh K., et al. "MetaMapp: mapping and visualizing metabolomic data by integrating information from
biochemical pathways and chemical and mass spectral similarity." BMC bioinformatics 13.1 (2012): 99.
MetaMapp integrates chemical and biochemical
relationships for a network visualization
44. Metabolic dys-regulation in aggressive ER(-) breast
tumors
Red nodes= increased in ER-
Blue nodes = decreased in ER-
Orange nodes = no change
Red edges= KEGG reactant pairs
Blue edges = Tanimoto Chemical similarity
Nucleotides
Sugar and sugar
phosphates
Amino acids
Fatty acids
Org acids
45. Integrative pathway network of protein and metabolites
Upregulated pathways in ER (-)
tumors :-
• TCA anaplerosis
• substrate recycling –
nucleotide salvage
• one-carbon metabolism
• cholesterol biosynthesis
• proline biosynthesis
Barupal, Dinesh K., et al. "Prioritization of metabolic genes as
novel therapeutic targets in estrogen-receptor negative breast
tumors using multi-omics data and text mining." BioRxiv(2019):
515403.
46. • Bias
• Efficiency
HMDB database cites only 2,156 papers for blood compounds
(status: June 2018)
All papers
on blood chemicals
~1.0 million
manual curation
Generating databases using manual
curation is in-efficient
47. Text mining can be used for building
context-specific chemical databases
52. WCMC Core : A SWOT analysis - Strengths
• Strong leadership
• Strong research team
• Strong IT team
• Trained and skilled staff
• Battery of unique computational resources
• Reputation
• Established client base
• Functional billing and project management units
• Metabolomics courses
• Range of mass spectrometry instruments
• Large collaborative projects
• Handling of large studies
53. WCMC Core : A SWOT analysis - Weaknesses
• Poor translation of data into publication by clients
• Long turn-around time
• Few targeted assays
• Few advanced instruments
• Limited lab-space for freezers, new instruments
54. WCMC Core : A SWOT analysis - Opportunities
• Trans-NIH projects
• Clinical collaborations
• Chemical screening in non-living objects
• Automation
• Cloud computing
• Global clients
• Teaching academy
• Open data initiatives
• Internal benchmarking
• Ring trials
55. WCMC Core : A SWOT analysis - Threats
• New disrupting MS instrument
• Assays offered by the Metabolon Inc
• New emerging cores with better instruments
• Staff movement
• Long instrument failures
56. FAIR compliance data
Findable, accessible, interoperable and reusable
• All SOPs will be available at https://www.protocols.io/
• Data descriptor papers as service
• Streamline the upload of studies to the metabolomics workbench
WCMC Core : client training and education
• WCMC training and courses for using WCMC data
• Online videos on our analytical assays and what to do with the resulting data
• Highlight the papers that have used WCMC services
• Advertising core services at larger scientific conferences
• Online tutorials and blogs to explain the services and data analysis workflows
• Organize online bootcamps and workshops
57. Key immediate technical challenges
• Expand the targeted assays
• Develop targeted + assays
• Automation of targeted data processing
• Automation of untargeted LC/MS data processing
• Automation of data merging from multiple assays
• Convert raw signals to quantitative values using internal
standards
• MS/MS annotation database
• Better sharing of raw data
• Streamline statistical analysis and bioinformatics
• Clean up the data dictionaries
58. 2008 2009 2010 2011 2012
2013 2014 2015 2016 2017 2018
Algae biofuel
pyGCMS
Pathway/network
DBs
My research history
MetaMapp
IBS disease
SHS
MetaMapp
MS similarity
networks
SH
TB breath
biomarkers
SpectConnect
automation
Machine learning
Integrated networks
Breast tumor biopsies
E.Coli metabolic network
Ataxia model for
depression
PhD PostDoc
Cancer epidemiology
Blood metabolomics
Nested case/control HCC
Air pollution study
Agilent QTOF 6550
operation
UC Davis India UC Davis
IARC/WHO France UC Davis
Exposomics studies
IARC monographs
Blood exposome
LCMS data processing
MS/MS Annotations
Cancer risk estimation
Agilent QTOF MS library
ChemRICH
Text mining
MetaMapp
MetaBox
ChemRICH
Text mining
ADNI, ccRCC
LC/MS data
processing
MetDA, SERRF
Blood exposome database
ADNI, Chronic Fatigue
MS libraries
Chemical text mining
KOMP mouse knockout
Tissue/Cells
Blood
Algae Biofuel
pyGCMS
05/2007
59. Altered metabolic
networks
Chronic diseases
Cancer, Diabetes, CVD, AD, CKD,
NAFLD
Exposome
chemicals
Genetics
Future direction
Comprehensive metabolomics assays –
1500 identified compounds
Multi-omics integration – specifically Whole genome
sequencing datasets.
Automated text mining and multi-
omics integration to interpret the
metabolomics results
Chemical prioritization – text mining,
omics databases, epidemiological
studies, animal assays
New bioinformatics
approaches
New Analytical
approaches
60. Acknowledgement
• Oliver Fiehn, UC Davis
• Kent Pinkerton, UC Davis
• Carsten Denkert, Charitie Hospital
Berlin
• Augustin Scalbert, Neela Guha, Kate
Guyton, Dana Lumis, IARC
• Rima Kaddurah-daouk, Duke
University
• Steve Rappaport, UC Berkeley
• Ian Lipkin, Columbia University
Notas del editor
Metabolomics is a mature technology to deliver reliable quantitative data for metabolites. Chromatography and mass spectrometry is available for last many decades but the availability of computational resources is the major factor for the development of the technology. Tedious and time consuming manual annotation of chromatographic peaks now has become a fully automated peak annotation using BinBase database.