Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Pathway talk for IGES 2009 Hawaii
1. Using pathways to
discover complex
disease models
Gary Chen,
Duncan Thomas
Department of
Using pathways to discover Preventive
Medicine
USC
complex disease models 1. Motivation
2. A stochastic
search variable
selection algorithm
3. Example using
Gary Chen, Duncan Thomas candidate genes
Department of Preventive Medicine 4. Ideas for GWAS
USC
October 20, 2009
2. Using pathways to
An outline discover complex
disease models
Gary Chen,
Duncan Thomas
Department of
Preventive
Medicine
USC
1. Motivation
1. Motivation
2. A stochastic
2. A stochastic search variable selection search variable
selection algorithm
algorithm 3. Example using
candidate genes
4. Ideas for GWAS
3. Example using candidate genes
4. Ideas for GWAS
3. Using pathways to
Common disease have complex discover complex
disease models
Gary Chen,
etiology Duncan Thomas
Department of
Preventive
Medicine
USC
1. Motivation
GWAS have had great success in searching 2. A stochastic
for genetic variants for common diseases search variable
selection algorithm
Recent successes: AMD, BMI/obesity, 3. Example using
candidate genes
Type 2 diabetes, Breast cancer, Prostate 4. Ideas for GWAS
cancer
4. Using pathways to
Common disease have complex discover complex
disease models
Gary Chen,
etiology Duncan Thomas
Department of
Preventive
Medicine
USC
1. Motivation
GWAS have had great success in searching 2. A stochastic
for genetic variants for common diseases search variable
selection algorithm
Recent successes: AMD, BMI/obesity, 3. Example using
candidate genes
Type 2 diabetes, Breast cancer, Prostate 4. Ideas for GWAS
cancer
Marginal effects from single SNP analyses
do not explain all heritability. Can we
move beyond the low-hanging fruit?
5. Using pathways to
Use biological knowledge to help discover complex
disease models
Gary Chen,
search for disease models Duncan Thomas
Department of
Preventive
Medicine
USC
Hierarchical Modeling 1. Motivation
Stabilizes effect estimates β from an 2. A stochastic
search variable
association test by assuming they come from selection algorithm
a prior distribution derived from biological 3. Example using
candidate genes
data 4. Ideas for GWAS
6. Using pathways to
Use biological knowledge to help discover complex
disease models
Gary Chen,
search for disease models Duncan Thomas
Department of
Preventive
Medicine
USC
Hierarchical Modeling 1. Motivation
Stabilizes effect estimates β from an 2. A stochastic
search variable
association test by assuming they come from selection algorithm
a prior distribution derived from biological 3. Example using
candidate genes
data 4. Ideas for GWAS
Examples in Genetic Epi
Model selection: Conti et al (Hum Her,
2003), Baurley et al(Stat Med, in review)
GWAS: Lewinger et al (Gen Epi 2007), Chen
et Witte (AJHG 2007)
Review: Thomas et al (Hum Genomics 2009)
7. Using pathways to
An outline discover complex
disease models
Gary Chen,
Duncan Thomas
Department of
Preventive
Medicine
USC
1. Motivation
1. Motivation
2. A stochastic
2. A stochastic search variable selection search variable
selection algorithm
algorithm 3. Example using
candidate genes
4. Ideas for GWAS
3. Example using candidate genes
4. Ideas for GWAS
8. Using pathways to
Searching for independent main discover complex
disease models
Gary Chen,
effects and their interactions Duncan Thomas
Department of
Preventive
Medicine
Ideally fit all predictors in a single model if USC
N >P 1. Motivation
2. A stochastic
Model selection: e.g. stepwise regression search variable
selection algorithm
P-values can be anti-conservative: Don’t 3. Example using
adjust for number of tests candidate genes
Can be computationally intractable 4. Ideas for GWAS
9. Using pathways to
Searching for independent main discover complex
disease models
Gary Chen,
effects and their interactions Duncan Thomas
Department of
Preventive
Medicine
Ideally fit all predictors in a single model if USC
N >P 1. Motivation
2. A stochastic
Model selection: e.g. stepwise regression search variable
selection algorithm
P-values can be anti-conservative: Don’t 3. Example using
adjust for number of tests candidate genes
Can be computationally intractable 4. Ideas for GWAS
An alternative: Bayesian model averaging
Probabilistically propose sub-models from a
posterior distribution
Summary statistics of parameters averaged
across all proposed models
Appears to better control for multiple
comparisons
10. Using pathways to
The model form: A two-level discover complex
disease models
Gary Chen,
hierarchical model Duncan Thomas
Department of
Preventive
Medicine
USC
1. Motivation
First Level: a linear model 2. A stochastic
search variable
K
logit(P(Y = 1|β, X )) ∼ β0 + k=1 βk X selection algorithm
X can be G, E, GxG, GxE, etc. 3. Example using
candidate genes
4. Ideas for GWAS
11. Using pathways to
The model form: A two-level discover complex
disease models
Gary Chen,
hierarchical model Duncan Thomas
Department of
Preventive
Medicine
USC
1. Motivation
First Level: a linear model 2. A stochastic
search variable
K
logit(P(Y = 1|β, X )) ∼ β0 + k=1 βk X selection algorithm
X can be G, E, GxG, GxE, etc. 3. Example using
candidate genes
Second level: a mixture prior on each βk 4. Ideas for GWAS
of univariate Gaussians:
¯ τ2
β ∼ N(φβk + (1 − φ)π T Zk , φ adjk + (1 − φ)σ 2 )
1st component: neighborhood of gene k
2nd component: pathway info on gene k
12. Using pathways to
How the parameters fit together discover complex
disease models
¯ τ2
β ∼ N(φβk + (1 − φ)π T Zk , φ adjk + (1 − φ)σ 2 ) Gary Chen,
Duncan Thomas
Department of
Preventive
Medicine
USC
1. Motivation
2. A stochastic
search variable
selection algorithm
3. Example using
candidate genes
4. Ideas for GWAS
13. Using pathways to
Stochastic Search Variable discover complex
disease models
Gary Chen,
Selection Duncan Thomas
Department of
Preventive
Medicine
USC
1. Motivation
Propose a swap, addition or deletion of an 2. A stochastic
search variable
selection algorithm
variable 3. Example using
candidate genes
4. Ideas for GWAS
14. Using pathways to
Stochastic Search Variable discover complex
disease models
Gary Chen,
Selection Duncan Thomas
Department of
Preventive
Medicine
USC
1. Motivation
Propose a swap, addition or deletion of an 2. A stochastic
search variable
selection algorithm
variable 3. Example using
Perform reversible jump Metropolis candidate genes
4. Ideas for GWAS
Hastings step comparing posterior
probabilities
P(Y =1|β ,X )P(β |Z ,A,π,σ,τ,φ)
H= P(Y =1|β,X )P(β|Z ,A,π,σ,τ,φ)
15. Using pathways to
Stochastic Search Variable discover complex
disease models
Gary Chen,
Selection Duncan Thomas
Department of
Preventive
Medicine
USC
1. Motivation
Propose a swap, addition or deletion of an 2. A stochastic
search variable
selection algorithm
variable 3. Example using
Perform reversible jump Metropolis candidate genes
4. Ideas for GWAS
Hastings step comparing posterior
probabilities
P(Y =1|β ,X )P(β |Z ,A,π,σ,τ,φ)
H= P(Y =1|β,X )P(β|Z ,A,π,σ,τ,φ)
Accept move with probability min(1, H)
16. Using pathways to
An outline discover complex
disease models
Gary Chen,
Duncan Thomas
Department of
Preventive
Medicine
USC
1. Motivation
1. Motivation
2. A stochastic
2. A stochastic search variable selection search variable
selection algorithm
algorithm 3. Example using
candidate genes
4. Ideas for GWAS
3. Example using candidate genes
4. Ideas for GWAS
17. Using pathways to
Folate pathway discover complex
disease models
Gary Chen,
Duncan Thomas
Department of
Preventive
Medicine
USC
1. Motivation
2. A stochastic
search variable
selection algorithm
3. Example using
candidate genes
4. Ideas for GWAS
Reed et al J Nutr. 2006 Oct;136(10):2653-61
18. Using pathways to
Simulated data set discover complex
disease models
Gary Chen,
Simulated data for 4000 individuals Duncan Thomas
Department of
Preventive
14 genes, 2 environmental variables Medicine
USC
Pathway enzymes: genotype specific rates 1. Motivation
2. A stochastic
search variable
selection algorithm
3. Example using
candidate genes
4. Ideas for GWAS
19. Using pathways to
Simulated data set discover complex
disease models
Gary Chen,
Simulated data for 4000 individuals Duncan Thomas
Department of
Preventive
14 genes, 2 environmental variables Medicine
USC
Pathway enzymes: genotype specific rates 1. Motivation
Simulating disease status 2. A stochastic
search variable
Assign homocysteine as causal mechanism selection algorithm
’Run’ the pathway until steady state 3. Example using
candidate genes
Probabilistically assign disease status 4. Ideas for GWAS
conditional on metabolite conc.
20. Using pathways to
Simulated data set discover complex
disease models
Gary Chen,
Simulated data for 4000 individuals Duncan Thomas
Department of
Preventive
14 genes, 2 environmental variables Medicine
USC
Pathway enzymes: genotype specific rates 1. Motivation
Simulating disease status 2. A stochastic
search variable
Assign homocysteine as causal mechanism selection algorithm
’Run’ the pathway until steady state 3. Example using
candidate genes
Probabilistically assign disease status 4. Ideas for GWAS
conditional on metabolite conc.
Priors
Deposit half the genotypes into prior
database
Z matrix, causal metabolite(s): correlation of
prior genotypes to candidate metabolite
A matrix, network information: correlation of
correlation profiles between two effects
21. Using pathways to
Setting up the priors discover complex
disease models
Gary Chen,
Duncan Thomas
Department of
Preventive
Medicine
USC
1. Motivation
2. A stochastic
search variable
selection algorithm
3. Example using
candidate genes
4. Ideas for GWAS
22. Using pathways to
Comparison discover complex
disease models
Gary Chen,
Duncan Thomas
Department of
Preventive
Medicine
USC
1. Motivation
2. A stochastic
search variable
selection algorithm
3. Example using
candidate genes
4. Ideas for GWAS
Same interactions detected. Z matrix provides
support.
23. Using pathways to
Sensitivity analysis discover complex
disease models
Gary Chen,
Duncan Thomas
Department of
Preventive
Medicine
How does our prior on β affect posterior USC
inference? 1. Motivation
2. A stochastic
search variable
selection algorithm
3. Example using
candidate genes
4. Ideas for GWAS
24. Using pathways to
Sensitivity analysis discover complex
disease models
Gary Chen,
Duncan Thomas
Department of
Preventive
Medicine
How does our prior on β affect posterior USC
inference? 1. Motivation
Compare four special cases of the prior 2. A stochastic
search variable
density: selection algorithm
3. Example using
¯
βpriork ∼ N(φβk + (1 − φ)π T Zk , candidate genes
τ2
φ nk + (1 − φ)σ 2 ) 4. Ideas for GWAS
25. Using pathways to
Sensitivity analysis discover complex
disease models
Gary Chen,
Duncan Thomas
Department of
Preventive
Medicine
How does our prior on β affect posterior USC
inference? 1. Motivation
Compare four special cases of the prior 2. A stochastic
search variable
density: selection algorithm
3. Example using
¯
βpriork ∼ N(φβk + (1 − φ)π T Zk , candidate genes
τ2
φ nk + (1 − φ)σ 2 ) 4. Ideas for GWAS
1. Non-informative: constrain φ = 0, π = 0
2. Z matrix: constrain φ = 0
3. Adjacency info: constrain π = 0
4. Z matrix and adjacency info: no
constraints
26. Using pathways to
Model averaged estimates of discover complex
disease models
Gary Chen,
hyperparameters Duncan Thomas
Department of
Preventive
Results Medicine
USC
Prior solely incorporating information in Z 1. Motivation
matrix appeared to explain residual variation 2. A stochastic
search variable
better than adjacency-only prior selection algorithm
π estimated at 1.86, consistent with 3. Example using
candidate genes
simulated effect size.
4. Ideas for GWAS
Scenario ˆ
σ2 ˆ
τ2 ˆ
φ
Non informative .48 N/A 0
Z matrix .00459 N/A 0
Adjacency .48 .22 .56
Z mat + Adj .00731 .23 .05
27. Using pathways to
Comparison among several priors discover complex
disease models
Gary Chen,
Duncan Thomas
Department of
Preventive
Medicine
USC
1. Motivation
2. A stochastic
search variable
selection algorithm
3. Example using
candidate genes
4. Ideas for GWAS
28. Using pathways to
Summary of simulated example discover complex
disease models
Gary Chen,
Duncan Thomas
Department of
Preventive
Medicine
Biomarker data incorporated as priors USC
Intermediate phenotypes believed to be 1. Motivation
2. A stochastic
causal in Z (mean) matrix search variable
selection algorithm
Global level pathway information encoded in
3. Example using
A (adjacency) matrix candidate genes
4. Ideas for GWAS
Influence of prior estimated by observed
data through π,τ ,σ,φ
Informative priors provided additional
support for causal genes
29. Using pathways to
An outline discover complex
disease models
Gary Chen,
Duncan Thomas
Department of
Preventive
Medicine
USC
1. Motivation
1. Motivation
2. A stochastic
2. A stochastic search variable selection search variable
selection algorithm
algorithm 3. Example using
candidate genes
4. Ideas for GWAS
3. Example using candidate genes
4. Ideas for GWAS
30. Using pathways to
Can be applied in genome-wide discover complex
disease models
Gary Chen,
association study Duncan Thomas
Department of
Preventive
Medicine
USC
Proof of concept: GWAS of breast cancer
1. Motivation
2000 cases, 2000 controls, ∼ 1M SNPs 2. A stochastic
Top SNP from each of 2755 genes, p < .05 search variable
selection algorithm
from GWAS 3. Example using
candidate genes
4. Ideas for GWAS
31. Using pathways to
Can be applied in genome-wide discover complex
disease models
Gary Chen,
association study Duncan Thomas
Department of
Preventive
Medicine
USC
Proof of concept: GWAS of breast cancer
1. Motivation
2000 cases, 2000 controls, ∼ 1M SNPs 2. A stochastic
Top SNP from each of 2755 genes, p < .05 search variable
selection algorithm
from GWAS 3. Example using
candidate genes
Gene Ontology used to define adjacency 4. Ideas for GWAS
matrix and proposal kernel
Considered the 22 GO terms under Biological
Process (Level 3)
Pair of SNPs considered neighbors if share at
least one GO term
Define a proposal density for new var Vi as:
Q(Vi ) = I (Aij,i=j = 0)
32. Using pathways to
Analysis discover complex
disease models
Gary Chen,
Duncan Thomas
Department of
Preventive
Medicine
USC
Stepwise regression: 1. Motivation
Considered only first 100 SNPs 2. A stochastic
search variable
Retained 83/100 SNPs selection algorithm
3. Example using
Intractable for 2nd order interactions candidate genes
4. Ideas for GWAS
33. Using pathways to
Analysis discover complex
disease models
Gary Chen,
Duncan Thomas
Department of
Preventive
Medicine
USC
Stepwise regression: 1. Motivation
Considered only first 100 SNPs 2. A stochastic
search variable
Retained 83/100 SNPs selection algorithm
3. Example using
Intractable for 2nd order interactions candidate genes
Our proposed algorithm: 4. Ideas for GWAS
Low posterior probability for interactions
Most sub-models contained variables with
shared annotation
34. Using pathways to
Sensitivity analysis discover complex
disease models
Gary Chen,
Duncan Thomas
Department of
Preventive
Medicine
USC
Compare non-informative prior to one
using GO terms in A 1. Motivation
2. A stochastic
1. Non-informative: constrain φ = 0 search variable
selection algorithm
2. Adjacency info: no constraint on φ
3. Example using
candidate genes
4. Ideas for GWAS
Scenario ˆ
σ2 ˆ
τ2 ˆ
φ
Non informative .01 N/A 0
Adjacency .01 .0004 .86
35. Using pathways to
Posterior inference discover complex
disease models
Gary Chen,
Duncan Thomas
Department of
Preventive
Medicine
USC
1. Motivation
2. A stochastic
search variable
selection algorithm
3. Example using
candidate genes
4. Ideas for GWAS
36. Using pathways to
Scaling up to larger sub-models discover complex
disease models
Gary Chen,
Duncan Thomas
Department of
Preventive
Medicine
USC
Need to test larger sub-models in GWAS 1. Motivation
settings 2. A stochastic
search variable
selection algorithm
Partition models into submodels using 3. Example using
candidate genes
ontology info 4. Ideas for GWAS
Parallel processing: nodes fit submodels
A parallelized MCMC algorithm - Poster
190
37. Using pathways to
Logical topology of sub-models discover complex
disease models
Gary Chen,
Duncan Thomas
Department of
Preventive
Medicine
USC
1. Motivation
2. A stochastic
search variable
selection algorithm
3. Example using
candidate genes
4. Ideas for GWAS
38. Using pathways to
Hierarchical model discover complex
disease models
Gary Chen,
Duncan Thomas
Department of
Preventive
Medicine
USC
1. Motivation
2. A stochastic
search variable
selection algorithm
3. Example using
candidate genes
4. Ideas for GWAS
39. Using pathways to
Summary for GWAS example discover complex
disease models
Gary Chen,
External knowledge can be informative Duncan Thomas
Department of
MLEs of β are smoothed towards pathway Preventive
Medicine
means USC
Ontologies useful: WECARE study in breast 1. Motivation
cancer - Poster 189 2. A stochastic
search variable
For GWAS: Genome-wide expression selection algorithm
potentially more biologically informative in Z 3. Example using
candidate genes
matrix 4. Ideas for GWAS
Priors can guide towards biologically relevant
interactions
40. Using pathways to
Summary for GWAS example discover complex
disease models
Gary Chen,
External knowledge can be informative Duncan Thomas
Department of
MLEs of β are smoothed towards pathway Preventive
Medicine
means USC
Ontologies useful: WECARE study in breast 1. Motivation
cancer - Poster 189 2. A stochastic
search variable
For GWAS: Genome-wide expression selection algorithm
potentially more biologically informative in Z 3. Example using
candidate genes
matrix 4. Ideas for GWAS
Priors can guide towards biologically relevant
interactions
Computational efficiency essential:
Defining proposal kernel: e.g. expit(π T Z )
More parsimonious sub-models desirable (e.g.
fused LASSO)
Fisher scoring can be improved using parallel
code (e.g. GPUs)
41. Using pathways to
Acknowledgements discover complex
disease models
Gary Chen,
Duncan Thomas
Department of
Preventive
Medicine
USC
1. Motivation
James Baurley 2. A stochastic
search variable
David Conti selection algorithm
3. Example using
Dataset: African American Breast Cancer candidate genes
4. Ideas for GWAS
GWAS Collaborators
Funding: R01 ES016813