Call Girls Varanasi Just Call 9907093804 Top Class Call Girl Service Available
Bioinformatics Strategies for Exposome 100416
1. Bioinformatics to enable robust biomedical
discovery with big data of the exposome
Chirag J Patel
International Society of Exposure Science
Utrecht 2016
10/11/16
chirag@hms.harvard.edu
@chiragjp
www.chiragjpgroup.org
2. Data streams in public health are getting large!
Capacity to measure and compute are
high-throughput.
projecthelix.eu
heals-eu.eu
exposomicsproject.eu
chearprogram.org
N>100K
1M genotypes
1000s of phenotypes
3. Does Bradford-Hill apply?:
Challenges exist in the use of high-throughput data
for discovery and causal research
Stat Med, 2015
Criterion 1:
Significance and effect size
Criterion 3:
Specificity
Criterion 2:
Consistency
4. Many challenges exist in the use of the exposome for
robust discovery and causal research
Thousands of hypotheses are possible.
Multiplicity of hypotheses.
Big data are observational.
Multiplicity of biases:
confounding, selection; reverse causal
High-throughput exposome: ready for prime time?
What is the identity of my unidentified target?
JAMA, 2014
ARPH, in press
6. Exposome offers a multiplicity of possible hypotheses!
Example: cohort database of E exposures and P phenotypes
Hum Genet 2012
JAMA, 2014
JECH 2014
Curr Env Health Rep 2016
ARPH, in press
which ones to test?
all?
the ones in blue (prior)?
E times P possibilities!
how to detect signal from noise?
7. The exposome offers a multiplicity of possible hypotheses:
A few examples from cohort studies
JECH, 2014
8. Big Data = Big Bias:
Confounding, reverse causality, and what
causes what
9. Interdependencies of the variables:
Correlation globes paint a complex view of exposure and
behavior
Red: positive ρ
Blue: negative ρ
thickness: |ρ|
for each pair of E:
Spearman ρ
(575 factors: 81,937 correlations)
permuted data to produce
“null ρ”
sought replication in > 1
cohort
JAMA 2014
Pac Symp Biocomput. 2015
JECH. 2015
National Health and Nutrition Examination
Survey (NHANES)
11. Red: positive ρ
Blue: negative ρ
thickness: |ρ|
for each pair of E:
Spearman ρ
(575 factors: 81,937 correlations)
Interdependencies of the variables:
Correlation globes paint a complex view of exposure and
behavior
permuted data to produce
“null ρ”
sought replication in > 1
cohort
JAMA 2014
Pac Symp Biocomput. 2015
JECH. 2015
National Health and Nutrition Examination
Survey (NHANES)
12. How can we query the large hypothesis space?
It is possible to find new exposure-phenotype
correlations with robust analytic support.
13. Bioinformatics-inspired guidelines to enhance the
opportunity for robust discovery in exposome-
research
1.) Test systematically, address multiplicity, and
replicate.
3.) Practice reproducible research and increase
data literacy
2.) Develop databases to disseminate exposome-
related findings (e.g., untargeted analyses)
JAMA, 2014
ARPH, in press
14. Test systematically and replicate.
Examples: “environment-wide” or “exposome-wide”
association studies
15. age (10 years)
income (quintile 2)
income (quintile 1)
male
black income (quintile 3)
any one smoke in home?
serum and urine cadmium
[1 SD]
past smoker?
current smoker?serum lycopene
[1SD]
physical activity
[low, moderate, high activity]*
*derived from METs per activity and categorized by Health.gov guidelines
R2
~ 2%
Searching >250 environmental and behavioral factors in
all-cause mortality
IJE, 2013
FDR < 5%
16. Searching 461 environmental and behavioral factors in
telomere length
IJE, 2016
PCBs
FDR<5%
Trunk Fat
Alk. PhosCRP
Cadmium
Cadmium (urine)cigs per day
retinyl stearate
R2
~ 1%
VO2 Maxpulse rate
shorter telomeres longer telomeres
adjusted by age, age2
, race, poverty, education, occupation
median N=3000; N range: 300-7000
17. Big Data offers a multiplicity of possible hypotheses!
Example: cohort database of E exposures and P phenotypes
Hum Genet 2012
JECH 2014
Curr Env Health Rep 2016
which ones to test?
all?
the ones in blue?
E times P possibilities!
how to detect signal from noise?
scale it up to multiple phenotypes!:
associate all E x P!
19. Testing all associations systematically:
Consideration of multiplicity of hypotheses and correlational web!
Explicit in number of hypotheses
tested
False discovery rate;
systematically examine (E x P);
Report database size!
Does my correlation matter?
How does my new correlation
compare to the family of correlations?
0.17 (e.g., carotene and diabetes)
is average ρ much less than 0.17? greater?
ρ
JAMA 2014
JECH 2015
20. Systematic analyses of the exposome can address
the fragmented literature of associations.
21. Example of fragmentation:
Is everything we eat associated with cancer?
Schoenfeld and Ioannidis, AJCN 2012
50 random ingredients from
Boston Cooking School
Cookbook
Any associated with cancer?
Of 50, 40 studied in cancer risk
Weak statistical evidence:
non-replicated
inconsistent effects
non-standardized
22. A maze of associations is one way to a fragmented
literature and Vibration of Effects
Young, 2011
univariate
sex
sex & age
sex & race
sex & race & age
JCE, 2015
23. Distribution of associations and p-values due to model choice:
Estimating the Vibration of Effects (or Risk)
JCE, 2015
24. The Vibration of Effects:
Vitamin D and Thyroxine and attenuated risk in mortality
JCE, 2015
25. The Vibration of Effects: shifts in the effect size distribution
due to select adjustments (e.g., adjusting cadmium levels with
smoking status)
JCE, 2015
27. JCE, 2015
Janus (two-faced) risk profile
Risk and significance depends on modeling scenario!
The Vibration of Effects: beware of the Janus effect
(both risk and protection?!)
“risk”“protection”
“significant”
Brittanica.com
28. Bioinformatics-inspired guidelines to enhance the
opportunity for robust discovery in exposome-
research
1.) Test systematically, address multiplicity, and
replicate.
3.) Practice reproducible research and increase
data literacy
2.) Develop databases to disseminate exposome-
related findings (e.g., untargeted analyses)
JAMA, 2014
ARPH, in press
30. What features of the untargeted serum
metabolome associated with diurnal variation?
Francine Laden (Harvard Chan)
Jaime Hart (Harvard Chan)
Dean Jones (Emory)
Doug Walker (Emory)
Jake Chung
31. We queried male truckers for differences in their
untargeted metabolome before and after their shift
(2 days) [n=89]
9,950 metabolomic features (LC-MS untargeted)
Hispanic and White [Pennsylvania and Illinois]
Average age of 48 (42-54)
Wilcoxon test (paired non-parametric t-test)
32. Up to 1% of features with robust significance:
What are their identities?
24 in day 1 and day 5!
day 1: post - pre shift
m=107 (1%)
day 5: post - pre shift
m=98 (1%)
33. Up to 1% of the metabolome with robust significance
(Bonferroni-corrected):
What are their identities?
258.6233
266.60907
267.1111
267.609
473.1966
478.2935
479.297
481.3098
?
34. Up to 1% of the metabolome with robust significance
(Bonferroni-corrected):
What are their identities?
35. We found up to 1% features of the untargeted
metabolome associated with intra-day effects
What are they?
Dark matter of the exposome?
False positives?
Francine Laden (Harvard Chan)
Jaime Hart (Harvard Chan)
Dean Jones (Emory)
Doug Walker (Emory)
Jake Chung
36. To efficiently sift, prioritize, and integrate associations:
Catalog of MWAS and EWAS findings
mass-to-charges
putative identity (hmdb id)
study design
phenotype
effect size
pvalue
Trimethylamine-N-oxide
59.035, 76.126
matched case-control
myocardial infarction
OR: 2.5
1x10-3
ARPH, in press
37. Catalog of GWAS findings have enabled integration and
critical evaluation of genotype-phenotype associations
https://www.ebi.ac.uk/gwas/
38. Bioinformatics-inspired guidelines to enhance the
opportunity for robust discovery in exposome-
research
1.) Test systematically, address multiplicity, and
replicate.
3.) Practice reproducible research and increase
data literacy
2.) Develop databases to disseminate exposome-
related findings (e.g., untargeted analyses)
JAMA, 2014
ARPH, in press
40. Accessible analytics tools and computer
infrastructure exist to enable reproducible research
“Ability to recompute data analytic
results given a observed dataset and
knowledge of the pipeline…”
Leek and Peng, PNAS 2015
(1) Raw data available
(2) Analytics code and documentation are available
(3) Trained data analysts to execute research
44. Many challenges exist in the use of the exposome for
robust discovery and causal research
Thousands of hypotheses are possible.
Multiplicity of hypotheses.
Big data are observational.
Multiplicity of biases:
confounding, selection; reverse causal
High-throughput exposome: ready for prime time?
What is the identity of my unidentified target?
45. Bioinformatics-inspired guidelines to enhance the
opportunity for robust discovery in exposome-
research
1.) Test systematically, address multiplicity, and
replicate.
3.) Practice reproducible research and increase
data literacy
2.) Develop databases to disseminate exposome-
related findings (e.g., untargeted analyses)
JAMA, 2014
ARPH, in press
46. Arjun Manrai
Yuxia Cui
Pierre Bushel
Molly Hall
Spyros Karakitsios
Carolyn Mattingly
Marylyn Ritchie
Charles Schmitt
Denis Sarigiannis
Duncan Thomas
David Wishart
David Balshaw
ARPH, in press
Informatics and Data Analytics to Support
Exposome-Based Discovery for Public Health
47. Harvard DBMI
Isaac Kohane
Susanne Churchill
Stan Shaw
Jenn Grandfield
Sunny Alvear
Michal Preminger
Harvard Chan
Hugues Aschard
Francesca Dominici
Chirag J Patel
chirag@hms.harvard.edu
@chiragjp
www.chiragjpgroup.org
NIH Common Fund
Big Data to Knowledge
Acknowledgements
Stanford
John Ioannidis
Atul Butte (UCSF)
RagGroup
Chirag Lakhani
Adam Brown
Danielle Rasooly
Arjun Manrai
Erik Corona
Nam Pho
Jake Chung
ISES Exposome Committee
Roel Vermeulen