SlideShare una empresa de Scribd logo
1 de 48
Descargar para leer sin conexión
Using Biological Knowledge To
Discover Higher Order Interactions
  In Genetic Association Studies
             Gary K. Chen
           Duncan C. Thomas
    Department of Preventive Medicine
                  USC


              May 19, 2010
Outline
  1. Motivation
  2. The algorithm: Incorporating biological priors
  into an MCMC sampler

  3. Simulation 1: Performance of the method
  4. Simulation 2: Detecting interactions in a known
  pathway

  5. Application to data from a GWAS

  6. Future Extensions
Common diseases have complex etiology

     GWAS have had great success in searching for
     genetic variants for common diseases
     Recent successes: AMD, BMI/obesity, Type 2
     diabetes, breast cancer, prostate cancer
     Marginal effects from single SNP analyses do
     not explain all heritability. Can we move
     beyond the low-hanging fruit? (e.g. CNVs, rare
     variants, epistatic interactions, etc.
     Ideally we would fit a model for all SNPs (and
     interactions too)
Analyzing all SNPs simultaneously
     Difficult for GWAS: predictors far exceed
     observations
     Shrinkage methods: LASSO, ridge regression,
     elastic net,...
         LASSO method (Tibshirani, J Royal Stat. Soc. 96)
         penalizes likelihood based on tuning parameter λ
         produces sparse (interpretable) models
     In GWAS settings:
         Double Exp (LaPlace) prior on β(Wu and Lange,
         Bioinf. 2009)
         Normal Exp Gamma prior on β(Hoggart et al
         PLOS Genet 2008)
         Fast! Provides the maximum a posteriori (MAP)
         estimates
Fully Bayesian methods for variable
selection
     Bayesian model averaging assesses uncertainty
         Probabilistically proposes sub-models from a
         posterior distribution
         Summarize statistics of parameters averaged across
         all proposed models
         Controls for multiple comparisons
     Disadvantage: Computationally expensive
     P(β) has normal distribution for conjugacy
     “Spike and slab” ensures parsimony
     Example: Stochastic Search Variable Selection
     via Gibbs sampling (George and McCulloch
     JASA 93)
         βj |γj ∼ (1 − γj )N(0, τj2 ) + γj N(0, cj2 τj2 )
                           γ
         e.g., f (γ) = Πpj j (1 − pj )(1−γj )
Searching for interactions
      SSVS via Gibbs Sampling
          For 1000 SNPs, length of γ:
          500,500=1000 + (1000)(999)
                                2
          Iterating through each parameter is slow
      Reversible jump MCMC
          In contrast to SSVS, the “model” is
          M = {j : γj = 0}
          Model size changes at each iteration (similar to
          stepwise regression)
      Informative priors
          Incorporating biological information at the level of
          each variable
          These priors can be used towards a proposal
          function in a Metropolis Hastings algorithm
Outline
  1. Motivation
  2. The algorithm: Incorporating biological priors
  into an MCMC sampler

  3. Simulation 1: Performance of the method
  4. Simulation 2: Detecting interactions in a known
  pathway

  5. Application to data from a GWAS

  6. Future Extensions
Posterior density as a two-level
hierarchical model

      Posterior density:
           L(Y |β, X , M)P(β|π, τ, σ, M, Z , A)
      First level as likelihood: a GLM at the subject
      level
                                             K
           logit(P(Y = 1|β, X )) ∼ β0 +      k=1   βk X
           X can be G, E, GxG, GxE, etc.
      Second level as prior: βk as mixed model
           βk ∼ π T Zk + φk + θk
Prior mean on variable in Z

                  Table: The Z matrix
     Intercept Conservation Missense eQTL
         1         20          0        5
         1         10          1      0.01
         1          5          0        1
         1         10          1       4.1
         1          5          0      1.4

     βk ∼ π T Zk + φk + θk
     ˆ           ˆ
     π : regress β on Z , π ∼ N(ˆ , Σπ )
                                π
Variable connectivity in A matrix



        Table: Example A matrix for SNP variables
                  Variable   1   2   3
                     1       0   1   0
                     2       1   0   1
                     3       0   1   0
One appraoch for populating the A matrix

                  Table: The Z matrix
      Intercept Conservation Missense eQTL
    →     1         20          0        5
          1         10          1      0.01
    →     1          5          0        1
          1         10          1       4.1
          1          5          0      1.4

     Define entry A1,3 as corr(Z1,− ,Z3,− ),
     dichotomize A
φk as mean across k’s neighbors
        Table: Example A matrix for SNP variables
                   Variable          1   2   3
                      1              0   1   0
                      2              1   0   1
                      3              0   1   0

     βk ∼ π T Zk + φk + θk
                   2
              ¯
     φk ∼ N(φ−k , τ )
                 Pm k
                     ν
         ¯        j=1 φj Ajk
         φ−k =   Pm          ,   νk neighbors of variable k
                   j=1 Ajk
                      ˆ
         We set φj = βj
                     ˆ
         Example: If β = (0.2, 0.5, 0.4), φ2 = 0.3
How the parameters fit together
     L(Y |β, X , M)P(β|Z , π, A, τ, σ, M)
A reversible jump MCMC algorithm


     Propose a swap, addition or deletion of an
     variable
     Perform reversible jump Metropolis Hastings
     step comparing posterior probabilities
              L(Y |β ,X ,M )P(β |Z ,π,A,τ,σ,M )P(M→M )
         r=    L(Y |β,X ,M)P(β|Z ,π,A,τ,σ,M)P(M →M)
     Accept move with probability min(1, r )
Model transition proposal density

     Suppose model M has 1 newly proposed
     variable:
         P(M → M ) = Φ−1 (zk )
         zk ∼ N(µk − µbaseline , 1)
     The variable-specific tuning parameter µk
         A function of the components of β’s prior
         standardized by their residual variances
                T     ¯
         µk = |π Zk +τφ−k |
                  2   2
                 σ +ν
                      k
         Weak empirical support for priors lead to small
         numerator, large denominator
Model transition proposal density

     Suppose model M has 1 newly proposed
     variable:
         P(M → M ) = Φ−1 (zk )
         zk ∼ N(µk − µbaseline , 1)
     The global penalty tuning parameter
         Emulate the BIC
         BIC (M ) − BIC (M) = χ1 (ln(n))
                                        −1
         Probability of accepting M is Fχ (ln(n))
                         −1
         µbaseline = Φ(Fχ (ln(n)))
Outline
  1. Motivation
  2. The algorithm: Incorporating biological priors
  into an MCMC sampler

  3. Simulation 1: Performance of the method
  4. Simulation 2: Detecting interactions in a known
  pathway

  5. Application to data from a GWAS

  6. Future Extensions
Using external information to enhance
power and specificity
     Disease model: 4 GxG interactions jointly
     cause disease through 4 endophenotypes
         Genotypes simulated for 14 independent SNPs
         yik = (1 − b)N(sia ∗ sib , 1) + bU(0, 1)
         b ∼ Bernoulli(p), p is proportion of noise
         24 endophenotypes y used only in the prior
     Disease status determined using a logistic
     model
         logit(Yi = 1) = β0 +β1 yi01 +β2 yi02 +β3 yi34 +β4 yi35
     First 8000 persons reserved as case control
     dataset, remaining 2000 for constructing priors
Constructing the Z and the A matrices

     Z matrix
         Measures correlation between a model variable and
         each endophenotype among 2000 individuals in the
         prior
         Zkq = corr(gk , yq )
     A matrix
         Measures similarity between two variables by
         comparing correlation profiles in Z
         Ajk = corr(Zjq , Zkq )
Question 1: How do the priors affect
power and specificity?
     The A matrix contains information across all
     24 endophenotypes
     Set up 3 variants of the original Z matrix
         4 causal endophenotypes only (noise parameter
         p = 0)
         4 intermediate endophenotypes only (noise
         parameter p = 0.2)
         4 weakly correlated endophenotypes only (noise
         parameter p = 0.8)
     Models tested:both A and Z , no A or Z , A
     only, Z only (with 3 variants)
Question 1: How do the priors affect
power and specificity?
    At RR=1.5, all prior models perform very well
Question 1: How do the priors affect
power and specificity?
     At RR=1.4, prior models with A, Z, or both
                outperform others
Question 1: How do the priors affect
power and specificity?
   At RR=1.3, prior models with A, Z, or both have
                    > 5% power
Question 1: How do the priors affect
power and specificity?
  At RR=1.2, fully informative prior still retains 80%
  power
Question 1: How do the priors affect
power and specificity?
  At RR=1.1, all prior models perform poorly (∼ 55%
                        power)
Question 2: How do the priors affect
posterior estimates (shrinkage)?
         Posterior estimates of β vs MLE
Question 2: How do the priors affect
posterior estimates (shrinkage)?
      Posterior estimates of SE of β vs MLE
Question 3: How do the priors improve
rankings?
        6,441 interactions tested. 4 causal.
Question 3: How do the priors improve
rankings?
       513,591 interactions tested. 4 causal.
Summary of simulation


     Sensitivity analysis
          All methods perform well at high RRs
          Informative priors improve power at lower RRs but
          not at extremely low RRs
     Like LASSO, shrinkage improves interpretability
     Model averaging can improve robustness of
     rankings
Outline
  1. Motivation
  2. The algorithm: Incorporating biological priors
  into an MCMC sampler

  3. Simulation 1: Performance of the method
  4. Simulation 2: Detecting interactions in a known
  pathway

  5. Application to data from a GWAS

  6. Future Extensions
Discovering interactions in a known
pathway: Folate
Simulated data set
     14 genes, 2 environmental variables
     8000 individuals in casecontrol data, remaining
     2000 for constructing priors
     Used a pathway simulation program to
     generate steady-state concentrations
         Reed et al J Nutr. 2006 Oct;136(10):2653-61
         Enzyme kinetics parameters (Km , Vmax ) genotype
         specific
     3 mechanisms believed to be related to disease
     etiology
         Homocysteine concentration
         Pyrimidine synthesis
         Purine synthesis
Estimates of π
      Construct Z and A in same manner as previous
      simulation:
           Z stores genotype-metabolite correlations
           A stores dichotomized-correlations between rows of
           Z
      True log relative risk: .18 (RR=1.2)

   Simulated              Second-level coefficients π
   mechanism      homocysteine      pyrimidine      purine
   homocysteine      0.18(0.13) -0.09(0.536) 0.002(0.38)
   pyrimidine       -0.04(0.22) 0.22(0.066) -0.01(0.06)
   purine           -0.01(0.36) 0.16(0.327) 0.19(0.07)
Comparison of BMA results to stepwise
regresssion
                   Pyrimidine synthesis
         Interaction
                     BF MLE p-value
        FTD*MAT-II   15          0.038
        FTD*MTHFR    20          0.046
        MTCH*MS     534          0.006
        PGT*MS       14          0.018
      → SHMT*CBS   1254          0.133
      → SHMT*Fol   2324          0.036
        TS*MTHFR    227          0.022
      → TS*SHMT    1091           N/S
Pyrimidine synthesis




     SHMT*CBS SHMT*Fol SHMT*TS
Comparison of BMA results to stepwise
regresssion

         Interaction  Purine synthesis
                       BF MLE p-value
     →   MTCH*MS     1130         0.008
     →   MTCH*PGT    1416         0.026
     →   PGT*CBS     1022         0.069
     →   PGT*MS      2851         0.007
     →   SHMT*Fol    1398         0.022
         SHMT*MAT-II 646          0.012
         TS*MTHFR      57         0.024
Purine synthesis




     MTCH*MS MTCH*PGT PGT*CBS PGT*MS
     SHMT*Fol
Comparison of BMA results to stepwise
regresssion
         Interaction  Homocysteine
                     BF MLE p-value
        CBS*MAT-II   77        0.045
      → CBS*Met    1072          N/S
        FTD*MAT-II   38        0.045
        FTD*MTHFR 213          0.015
      → MS*Met     1129          N/S
        MTCH*MS     978        0.006
        PGT*MS       75        0.044
        TS*MTHFR     41        0.022
Homocysteine levels




     CBS*Met MS*Met
Summary of folate pathway simulation


     Pathway knowledge can inform model search
     Simulated three plausible disease mechanisms
     Effect of causal metabolite on disease revealed
     in corresponding element of π
     Revealed plausible interactions not found
     through a stepwise regression
Outline
  1. Motivation
  2. The algorithm: Incorporating biological priors
  into an MCMC sampler

  3. Simulation 1: Performance of the method
  4. Simulation 2: Detecting interactions in a known
  pathway

  5. Application to data from a GWAS

  6. Future Extensions
Using gene annotations to inform a search
for interactions
     Proof of concept: GWAS of breast cancer
     Publicly data from NCI
     (https://caintegrator.nci.nih.gov/cgems/)
     1,145 cases and 1,142 controls of European
     ancestry
     The 22 Gene Ontology terms from Biological
     Process used to define priors in A and Z
     Included 6,078 SNPs, where each SNP had GO
     annotation and had lowest p-value in gene
Top 10 interactions found
   Interaction      Non-inf prior           inf prior
                      β(SE)      BF          β(SE)       BF
   PARK2*SORCS1   0.22(0.06) 1e 4       0.27(0.06)      5e 4
   AK5*ARHGAP26   0.16(0.05) 427        0.17(0.05)      903
   FGFR2*MAML2   -0.11(0.04)       1   -0.16(0.05)      686
   SHC3*KIF13B          N/A N/A         0.17(0.05)      621
   PCLO*ME3             N/A N/A         0.18(0.05)      528
   CNGA3*CNN1    -0.16(0.05)      41   -0.17(0.05)      462
   FGFR2*CDT1           N/A N/A        -0.16(0.05)      445
   SHC3*CXCL16          N/A N/A        -0.18(0.05)      403
   FGFR2*ABCA1    -0.1(0.05) 158       -0.11(0.05)      268
   CYP2J2*SORCS1 -0.11(0.05)      74   -0.14(0.05)      266
   FGFR2*SCG5           N/A N/A         0.21(0.05)      235
Enrichment analysis
     Are the top interactions (BF > 100) enriched
     for certain GO terms?
     Compute empiric p-value for enrichment
         For each permute within bins representative of
         non-independence in observed interactions
         Pool bins, compute frequency of a GO term in the
         pool
         pvalue: Number of iterations freq exceeded obs
         freq divided by 1 million
     biological regulation (p=.008), growth
     (p=1e −6 ), metabolic process (p=.008), and
     regulation of biological process (p=.003).
Outline
  1. Motivation
  2. The algorithm: Incorporating biological priors
  into an MCMC sampler

  3. Simulation 1: Performance of the method
  4. Simulation 2: Detecting interactions in a known
  pathway

  5. Application to data from a GWAS

  6. Future Extensions
Incorporate gene-expression data into
GWAS analyses
     Developing priors
         Should be more informative (e.g. empirical) and
         granular (e.g. SNP level) than GO
         Obtain genotype-expression paired data: HapMap?
         Apply WGCNA to infer pathway modules
         Genotype-module correlations used in Z matrix
     Incorporate more advanced MCMC techniques
         Evolutionary Monte Carlo
         Multiply-try Metropolis
         Brute-force search for MAP. Use MAP for initial
         values?
Acknowledgements



    James Baurley
    David Conti
    Angela Presson (thanks in advance!)
    Funding: R01 ES016813 and R01 ES015090.

Más contenido relacionado

La actualidad más candente

Chapter 3 projection
Chapter 3 projectionChapter 3 projection
Chapter 3 projectionNBER
 
Chapter 2 pertubation
Chapter 2 pertubationChapter 2 pertubation
Chapter 2 pertubationNBER
 
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsRao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsChristian Robert
 
Bayesian regression models and treed Gaussian process models
Bayesian regression models and treed Gaussian process modelsBayesian regression models and treed Gaussian process models
Bayesian regression models and treed Gaussian process modelsTommaso Rigon
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?Christian Robert
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsStefano Cabras
 
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural NetworksMasahiro Suzuki
 
Generalization of Tensor Factorization and Applications
Generalization of Tensor Factorization and ApplicationsGeneralization of Tensor Factorization and Applications
Generalization of Tensor Factorization and ApplicationsKohei Hayashi
 
Nber slides11 lecture2
Nber slides11 lecture2Nber slides11 lecture2
Nber slides11 lecture2NBER
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
 

La actualidad más candente (20)

Lesage
LesageLesage
Lesage
 
Chapter 3 projection
Chapter 3 projectionChapter 3 projection
Chapter 3 projection
 
Bertail
BertailBertail
Bertail
 
ABC in Venezia
ABC in VeneziaABC in Venezia
ABC in Venezia
 
Chapter 2 pertubation
Chapter 2 pertubationChapter 2 pertubation
Chapter 2 pertubation
 
Big model, big data
Big model, big dataBig model, big data
Big model, big data
 
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsRao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
 
Nested sampling
Nested samplingNested sampling
Nested sampling
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Bayesian regression models and treed Gaussian process models
Bayesian regression models and treed Gaussian process modelsBayesian regression models and treed Gaussian process models
Bayesian regression models and treed Gaussian process models
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-Likelihoods
 
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
 
Generalization of Tensor Factorization and Applications
Generalization of Tensor Factorization and ApplicationsGeneralization of Tensor Factorization and Applications
Generalization of Tensor Factorization and Applications
 
International Journal of Engineering Inventions (IJEI),
International Journal of Engineering Inventions (IJEI), International Journal of Engineering Inventions (IJEI),
International Journal of Engineering Inventions (IJEI),
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Nber slides11 lecture2
Nber slides11 lecture2Nber slides11 lecture2
Nber slides11 lecture2
 
DPG_Talk_March2011_AlexandraM_Liguori
DPG_Talk_March2011_AlexandraM_LiguoriDPG_Talk_March2011_AlexandraM_Liguori
DPG_Talk_March2011_AlexandraM_Liguori
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 

Destacado

Analysis update for GENEVA meeting 2011
Analysis update for GENEVA meeting 2011Analysis update for GENEVA meeting 2011
Analysis update for GENEVA meeting 2011USC
 
Pathway talk for IGES 2009 Hawaii
Pathway talk for IGES 2009 HawaiiPathway talk for IGES 2009 Hawaii
Pathway talk for IGES 2009 HawaiiUSC
 
Kinship adjusted armitage trend test for ENDGAME meeting 2008
Kinship adjusted armitage trend test for ENDGAME meeting 2008Kinship adjusted armitage trend test for ENDGAME meeting 2008
Kinship adjusted armitage trend test for ENDGAME meeting 2008USC
 
Multi-core programming talk for weekly biostat seminar
Multi-core programming talk for weekly biostat seminarMulti-core programming talk for weekly biostat seminar
Multi-core programming talk for weekly biostat seminarUSC
 
OpenCL applications in genomics
OpenCL applications in genomicsOpenCL applications in genomics
OpenCL applications in genomicsUSC
 
12 Fold Way To Venture Capital Funding
12 Fold Way To Venture Capital Funding12 Fold Way To Venture Capital Funding
12 Fold Way To Venture Capital Fundingbobweber
 
Weber DRM What Should The Model Be MIT 11 26 07
Weber DRM What Should The Model Be MIT 11 26 07Weber DRM What Should The Model Be MIT 11 26 07
Weber DRM What Should The Model Be MIT 11 26 07bobweber
 
第1回PHP拡張勉強会
第1回PHP拡張勉強会第1回PHP拡張勉強会
第1回PHP拡張勉強会Ippei Ogiwara
 
Haplotyping and genotype imputation using Graphics Processing Units
Haplotyping and genotype imputation using Graphics Processing UnitsHaplotyping and genotype imputation using Graphics Processing Units
Haplotyping and genotype imputation using Graphics Processing UnitsUSC
 
Scenario Mapping Introduction
Scenario Mapping IntroductionScenario Mapping Introduction
Scenario Mapping Introductionbobweber
 
コミュニティでの動画配信の広がり、それらをささえるツール
コミュニティでの動画配信の広がり、それらをささえるツールコミュニティでの動画配信の広がり、それらをささえるツール
コミュニティでの動画配信の広がり、それらをささえるツールIppei Ogiwara
 
いままで使ってきた携帯電話
いままで使ってきた携帯電話いままで使ってきた携帯電話
いままで使ってきた携帯電話Ippei Ogiwara
 
3分くらいで分かるassert()
3分くらいで分かるassert()3分くらいで分かるassert()
3分くらいで分かるassert()Ippei Ogiwara
 
テンプレート管理ツール r3
テンプレート管理ツール r3テンプレート管理ツール r3
テンプレート管理ツール r3Ippei Ogiwara
 

Destacado (18)

Analysis update for GENEVA meeting 2011
Analysis update for GENEVA meeting 2011Analysis update for GENEVA meeting 2011
Analysis update for GENEVA meeting 2011
 
Pathway talk for IGES 2009 Hawaii
Pathway talk for IGES 2009 HawaiiPathway talk for IGES 2009 Hawaii
Pathway talk for IGES 2009 Hawaii
 
Kinship adjusted armitage trend test for ENDGAME meeting 2008
Kinship adjusted armitage trend test for ENDGAME meeting 2008Kinship adjusted armitage trend test for ENDGAME meeting 2008
Kinship adjusted armitage trend test for ENDGAME meeting 2008
 
Multi-core programming talk for weekly biostat seminar
Multi-core programming talk for weekly biostat seminarMulti-core programming talk for weekly biostat seminar
Multi-core programming talk for weekly biostat seminar
 
OpenCL applications in genomics
OpenCL applications in genomicsOpenCL applications in genomics
OpenCL applications in genomics
 
12 Fold Way To Venture Capital Funding
12 Fold Way To Venture Capital Funding12 Fold Way To Venture Capital Funding
12 Fold Way To Venture Capital Funding
 
Weber DRM What Should The Model Be MIT 11 26 07
Weber DRM What Should The Model Be MIT 11 26 07Weber DRM What Should The Model Be MIT 11 26 07
Weber DRM What Should The Model Be MIT 11 26 07
 
第1回PHP拡張勉強会
第1回PHP拡張勉強会第1回PHP拡張勉強会
第1回PHP拡張勉強会
 
Haplotyping and genotype imputation using Graphics Processing Units
Haplotyping and genotype imputation using Graphics Processing UnitsHaplotyping and genotype imputation using Graphics Processing Units
Haplotyping and genotype imputation using Graphics Processing Units
 
Scenario Mapping Introduction
Scenario Mapping IntroductionScenario Mapping Introduction
Scenario Mapping Introduction
 
コミュニティでの動画配信の広がり、それらをささえるツール
コミュニティでの動画配信の広がり、それらをささえるツールコミュニティでの動画配信の広がり、それらをささえるツール
コミュニティでの動画配信の広がり、それらをささえるツール
 
いままで使ってきた携帯電話
いままで使ってきた携帯電話いままで使ってきた携帯電話
いままで使ってきた携帯電話
 
Pqp1
Pqp1Pqp1
Pqp1
 
PHP at Yahoo! JAPAN
PHP at Yahoo! JAPANPHP at Yahoo! JAPAN
PHP at Yahoo! JAPAN
 
Divisao celular
Divisao celularDivisao celular
Divisao celular
 
3分くらいで分かるassert()
3分くらいで分かるassert()3分くらいで分かるassert()
3分くらいで分かるassert()
 
テンプレート管理ツール r3
テンプレート管理ツール r3テンプレート管理ツール r3
テンプレート管理ツール r3
 
Turma do curso de informatica
Turma do curso de informaticaTurma do curso de informatica
Turma do curso de informatica
 

Similar a Integration of biological annotations using hierarchical modeling

Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...Gota Morota
 
Non-informative reparametrisation for location-scale mixtures
Non-informative reparametrisation for location-scale mixturesNon-informative reparametrisation for location-scale mixtures
Non-informative reparametrisation for location-scale mixturesChristian Robert
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsChristian Robert
 
Markov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing themMarkov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing themPierre Jacob
 
2012 mdsp pr05 particle filter
2012 mdsp pr05 particle filter2012 mdsp pr05 particle filter
2012 mdsp pr05 particle filternozomuhamada
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking componentsChristian Robert
 
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...SYRTO Project
 
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...Gota Morota
 
Discretization of a Mathematical Model for Tumor-Immune System Interaction wi...
Discretization of a Mathematical Model for Tumor-Immune System Interaction wi...Discretization of a Mathematical Model for Tumor-Immune System Interaction wi...
Discretization of a Mathematical Model for Tumor-Immune System Interaction wi...mathsjournal
 
DISCRETIZATION OF A MATHEMATICAL MODEL FOR TUMOR-IMMUNE SYSTEM INTERACTION WI...
DISCRETIZATION OF A MATHEMATICAL MODEL FOR TUMOR-IMMUNE SYSTEM INTERACTION WI...DISCRETIZATION OF A MATHEMATICAL MODEL FOR TUMOR-IMMUNE SYSTEM INTERACTION WI...
DISCRETIZATION OF A MATHEMATICAL MODEL FOR TUMOR-IMMUNE SYSTEM INTERACTION WI...mathsjournal
 
DISCRETIZATION OF A MATHEMATICAL MODEL FOR TUMOR-IMMUNE SYSTEM INTERACTION WI...
DISCRETIZATION OF A MATHEMATICAL MODEL FOR TUMOR-IMMUNE SYSTEM INTERACTION WI...DISCRETIZATION OF A MATHEMATICAL MODEL FOR TUMOR-IMMUNE SYSTEM INTERACTION WI...
DISCRETIZATION OF A MATHEMATICAL MODEL FOR TUMOR-IMMUNE SYSTEM INTERACTION WI...mathsjournal
 
DISCRETIZATION OF A MATHEMATICAL MODEL FOR TUMOR-IMMUNE SYSTEM INTERACTION WI...
DISCRETIZATION OF A MATHEMATICAL MODEL FOR TUMOR-IMMUNE SYSTEM INTERACTION WI...DISCRETIZATION OF A MATHEMATICAL MODEL FOR TUMOR-IMMUNE SYSTEM INTERACTION WI...
DISCRETIZATION OF A MATHEMATICAL MODEL FOR TUMOR-IMMUNE SYSTEM INTERACTION WI...mathsjournal
 
DISCRETIZATION OF A MATHEMATICAL MODEL FOR TUMOR-IMMUNE SYSTEM INTERACTION WI...
DISCRETIZATION OF A MATHEMATICAL MODEL FOR TUMOR-IMMUNE SYSTEM INTERACTION WI...DISCRETIZATION OF A MATHEMATICAL MODEL FOR TUMOR-IMMUNE SYSTEM INTERACTION WI...
DISCRETIZATION OF A MATHEMATICAL MODEL FOR TUMOR-IMMUNE SYSTEM INTERACTION WI...mathsjournal
 

Similar a Integration of biological annotations using hierarchical modeling (20)

Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
 
Non-informative reparametrisation for location-scale mixtures
Non-informative reparametrisation for location-scale mixturesNon-informative reparametrisation for location-scale mixtures
Non-informative reparametrisation for location-scale mixtures
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithms
 
BAYSM'14, Wien, Austria
BAYSM'14, Wien, AustriaBAYSM'14, Wien, Austria
BAYSM'14, Wien, Austria
 
Poster_PingPong
Poster_PingPongPoster_PingPong
Poster_PingPong
 
Markov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing themMarkov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing them
 
Symmetrical2
Symmetrical2Symmetrical2
Symmetrical2
 
2012 mdsp pr05 particle filter
2012 mdsp pr05 particle filter2012 mdsp pr05 particle filter
2012 mdsp pr05 particle filter
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
 
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
 
PosterDinius
PosterDiniusPosterDinius
PosterDinius
 
Discretization of a Mathematical Model for Tumor-Immune System Interaction wi...
Discretization of a Mathematical Model for Tumor-Immune System Interaction wi...Discretization of a Mathematical Model for Tumor-Immune System Interaction wi...
Discretization of a Mathematical Model for Tumor-Immune System Interaction wi...
 
DISCRETIZATION OF A MATHEMATICAL MODEL FOR TUMOR-IMMUNE SYSTEM INTERACTION WI...
DISCRETIZATION OF A MATHEMATICAL MODEL FOR TUMOR-IMMUNE SYSTEM INTERACTION WI...DISCRETIZATION OF A MATHEMATICAL MODEL FOR TUMOR-IMMUNE SYSTEM INTERACTION WI...
DISCRETIZATION OF A MATHEMATICAL MODEL FOR TUMOR-IMMUNE SYSTEM INTERACTION WI...
 
DISCRETIZATION OF A MATHEMATICAL MODEL FOR TUMOR-IMMUNE SYSTEM INTERACTION WI...
DISCRETIZATION OF A MATHEMATICAL MODEL FOR TUMOR-IMMUNE SYSTEM INTERACTION WI...DISCRETIZATION OF A MATHEMATICAL MODEL FOR TUMOR-IMMUNE SYSTEM INTERACTION WI...
DISCRETIZATION OF A MATHEMATICAL MODEL FOR TUMOR-IMMUNE SYSTEM INTERACTION WI...
 
DISCRETIZATION OF A MATHEMATICAL MODEL FOR TUMOR-IMMUNE SYSTEM INTERACTION WI...
DISCRETIZATION OF A MATHEMATICAL MODEL FOR TUMOR-IMMUNE SYSTEM INTERACTION WI...DISCRETIZATION OF A MATHEMATICAL MODEL FOR TUMOR-IMMUNE SYSTEM INTERACTION WI...
DISCRETIZATION OF A MATHEMATICAL MODEL FOR TUMOR-IMMUNE SYSTEM INTERACTION WI...
 
DISCRETIZATION OF A MATHEMATICAL MODEL FOR TUMOR-IMMUNE SYSTEM INTERACTION WI...
DISCRETIZATION OF A MATHEMATICAL MODEL FOR TUMOR-IMMUNE SYSTEM INTERACTION WI...DISCRETIZATION OF A MATHEMATICAL MODEL FOR TUMOR-IMMUNE SYSTEM INTERACTION WI...
DISCRETIZATION OF A MATHEMATICAL MODEL FOR TUMOR-IMMUNE SYSTEM INTERACTION WI...
 

Último

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 

Último (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

Integration of biological annotations using hierarchical modeling

  • 1. Using Biological Knowledge To Discover Higher Order Interactions In Genetic Association Studies Gary K. Chen Duncan C. Thomas Department of Preventive Medicine USC May 19, 2010
  • 2. Outline 1. Motivation 2. The algorithm: Incorporating biological priors into an MCMC sampler 3. Simulation 1: Performance of the method 4. Simulation 2: Detecting interactions in a known pathway 5. Application to data from a GWAS 6. Future Extensions
  • 3. Common diseases have complex etiology GWAS have had great success in searching for genetic variants for common diseases Recent successes: AMD, BMI/obesity, Type 2 diabetes, breast cancer, prostate cancer Marginal effects from single SNP analyses do not explain all heritability. Can we move beyond the low-hanging fruit? (e.g. CNVs, rare variants, epistatic interactions, etc. Ideally we would fit a model for all SNPs (and interactions too)
  • 4. Analyzing all SNPs simultaneously Difficult for GWAS: predictors far exceed observations Shrinkage methods: LASSO, ridge regression, elastic net,... LASSO method (Tibshirani, J Royal Stat. Soc. 96) penalizes likelihood based on tuning parameter λ produces sparse (interpretable) models In GWAS settings: Double Exp (LaPlace) prior on β(Wu and Lange, Bioinf. 2009) Normal Exp Gamma prior on β(Hoggart et al PLOS Genet 2008) Fast! Provides the maximum a posteriori (MAP) estimates
  • 5. Fully Bayesian methods for variable selection Bayesian model averaging assesses uncertainty Probabilistically proposes sub-models from a posterior distribution Summarize statistics of parameters averaged across all proposed models Controls for multiple comparisons Disadvantage: Computationally expensive P(β) has normal distribution for conjugacy “Spike and slab” ensures parsimony Example: Stochastic Search Variable Selection via Gibbs sampling (George and McCulloch JASA 93) βj |γj ∼ (1 − γj )N(0, τj2 ) + γj N(0, cj2 τj2 ) γ e.g., f (γ) = Πpj j (1 − pj )(1−γj )
  • 6. Searching for interactions SSVS via Gibbs Sampling For 1000 SNPs, length of γ: 500,500=1000 + (1000)(999) 2 Iterating through each parameter is slow Reversible jump MCMC In contrast to SSVS, the “model” is M = {j : γj = 0} Model size changes at each iteration (similar to stepwise regression) Informative priors Incorporating biological information at the level of each variable These priors can be used towards a proposal function in a Metropolis Hastings algorithm
  • 7. Outline 1. Motivation 2. The algorithm: Incorporating biological priors into an MCMC sampler 3. Simulation 1: Performance of the method 4. Simulation 2: Detecting interactions in a known pathway 5. Application to data from a GWAS 6. Future Extensions
  • 8. Posterior density as a two-level hierarchical model Posterior density: L(Y |β, X , M)P(β|π, τ, σ, M, Z , A) First level as likelihood: a GLM at the subject level K logit(P(Y = 1|β, X )) ∼ β0 + k=1 βk X X can be G, E, GxG, GxE, etc. Second level as prior: βk as mixed model βk ∼ π T Zk + φk + θk
  • 9. Prior mean on variable in Z Table: The Z matrix Intercept Conservation Missense eQTL 1 20 0 5 1 10 1 0.01 1 5 0 1 1 10 1 4.1 1 5 0 1.4 βk ∼ π T Zk + φk + θk ˆ ˆ π : regress β on Z , π ∼ N(ˆ , Σπ ) π
  • 10. Variable connectivity in A matrix Table: Example A matrix for SNP variables Variable 1 2 3 1 0 1 0 2 1 0 1 3 0 1 0
  • 11. One appraoch for populating the A matrix Table: The Z matrix Intercept Conservation Missense eQTL → 1 20 0 5 1 10 1 0.01 → 1 5 0 1 1 10 1 4.1 1 5 0 1.4 Define entry A1,3 as corr(Z1,− ,Z3,− ), dichotomize A
  • 12. φk as mean across k’s neighbors Table: Example A matrix for SNP variables Variable 1 2 3 1 0 1 0 2 1 0 1 3 0 1 0 βk ∼ π T Zk + φk + θk 2 ¯ φk ∼ N(φ−k , τ ) Pm k ν ¯ j=1 φj Ajk φ−k = Pm , νk neighbors of variable k j=1 Ajk ˆ We set φj = βj ˆ Example: If β = (0.2, 0.5, 0.4), φ2 = 0.3
  • 13. How the parameters fit together L(Y |β, X , M)P(β|Z , π, A, τ, σ, M)
  • 14. A reversible jump MCMC algorithm Propose a swap, addition or deletion of an variable Perform reversible jump Metropolis Hastings step comparing posterior probabilities L(Y |β ,X ,M )P(β |Z ,π,A,τ,σ,M )P(M→M ) r= L(Y |β,X ,M)P(β|Z ,π,A,τ,σ,M)P(M →M) Accept move with probability min(1, r )
  • 15. Model transition proposal density Suppose model M has 1 newly proposed variable: P(M → M ) = Φ−1 (zk ) zk ∼ N(µk − µbaseline , 1) The variable-specific tuning parameter µk A function of the components of β’s prior standardized by their residual variances T ¯ µk = |π Zk +τφ−k | 2 2 σ +ν k Weak empirical support for priors lead to small numerator, large denominator
  • 16. Model transition proposal density Suppose model M has 1 newly proposed variable: P(M → M ) = Φ−1 (zk ) zk ∼ N(µk − µbaseline , 1) The global penalty tuning parameter Emulate the BIC BIC (M ) − BIC (M) = χ1 (ln(n)) −1 Probability of accepting M is Fχ (ln(n)) −1 µbaseline = Φ(Fχ (ln(n)))
  • 17. Outline 1. Motivation 2. The algorithm: Incorporating biological priors into an MCMC sampler 3. Simulation 1: Performance of the method 4. Simulation 2: Detecting interactions in a known pathway 5. Application to data from a GWAS 6. Future Extensions
  • 18. Using external information to enhance power and specificity Disease model: 4 GxG interactions jointly cause disease through 4 endophenotypes Genotypes simulated for 14 independent SNPs yik = (1 − b)N(sia ∗ sib , 1) + bU(0, 1) b ∼ Bernoulli(p), p is proportion of noise 24 endophenotypes y used only in the prior Disease status determined using a logistic model logit(Yi = 1) = β0 +β1 yi01 +β2 yi02 +β3 yi34 +β4 yi35 First 8000 persons reserved as case control dataset, remaining 2000 for constructing priors
  • 19. Constructing the Z and the A matrices Z matrix Measures correlation between a model variable and each endophenotype among 2000 individuals in the prior Zkq = corr(gk , yq ) A matrix Measures similarity between two variables by comparing correlation profiles in Z Ajk = corr(Zjq , Zkq )
  • 20. Question 1: How do the priors affect power and specificity? The A matrix contains information across all 24 endophenotypes Set up 3 variants of the original Z matrix 4 causal endophenotypes only (noise parameter p = 0) 4 intermediate endophenotypes only (noise parameter p = 0.2) 4 weakly correlated endophenotypes only (noise parameter p = 0.8) Models tested:both A and Z , no A or Z , A only, Z only (with 3 variants)
  • 21. Question 1: How do the priors affect power and specificity? At RR=1.5, all prior models perform very well
  • 22. Question 1: How do the priors affect power and specificity? At RR=1.4, prior models with A, Z, or both outperform others
  • 23. Question 1: How do the priors affect power and specificity? At RR=1.3, prior models with A, Z, or both have > 5% power
  • 24. Question 1: How do the priors affect power and specificity? At RR=1.2, fully informative prior still retains 80% power
  • 25. Question 1: How do the priors affect power and specificity? At RR=1.1, all prior models perform poorly (∼ 55% power)
  • 26. Question 2: How do the priors affect posterior estimates (shrinkage)? Posterior estimates of β vs MLE
  • 27. Question 2: How do the priors affect posterior estimates (shrinkage)? Posterior estimates of SE of β vs MLE
  • 28. Question 3: How do the priors improve rankings? 6,441 interactions tested. 4 causal.
  • 29. Question 3: How do the priors improve rankings? 513,591 interactions tested. 4 causal.
  • 30. Summary of simulation Sensitivity analysis All methods perform well at high RRs Informative priors improve power at lower RRs but not at extremely low RRs Like LASSO, shrinkage improves interpretability Model averaging can improve robustness of rankings
  • 31. Outline 1. Motivation 2. The algorithm: Incorporating biological priors into an MCMC sampler 3. Simulation 1: Performance of the method 4. Simulation 2: Detecting interactions in a known pathway 5. Application to data from a GWAS 6. Future Extensions
  • 32. Discovering interactions in a known pathway: Folate
  • 33. Simulated data set 14 genes, 2 environmental variables 8000 individuals in casecontrol data, remaining 2000 for constructing priors Used a pathway simulation program to generate steady-state concentrations Reed et al J Nutr. 2006 Oct;136(10):2653-61 Enzyme kinetics parameters (Km , Vmax ) genotype specific 3 mechanisms believed to be related to disease etiology Homocysteine concentration Pyrimidine synthesis Purine synthesis
  • 34. Estimates of π Construct Z and A in same manner as previous simulation: Z stores genotype-metabolite correlations A stores dichotomized-correlations between rows of Z True log relative risk: .18 (RR=1.2) Simulated Second-level coefficients π mechanism homocysteine pyrimidine purine homocysteine 0.18(0.13) -0.09(0.536) 0.002(0.38) pyrimidine -0.04(0.22) 0.22(0.066) -0.01(0.06) purine -0.01(0.36) 0.16(0.327) 0.19(0.07)
  • 35. Comparison of BMA results to stepwise regresssion Pyrimidine synthesis Interaction BF MLE p-value FTD*MAT-II 15 0.038 FTD*MTHFR 20 0.046 MTCH*MS 534 0.006 PGT*MS 14 0.018 → SHMT*CBS 1254 0.133 → SHMT*Fol 2324 0.036 TS*MTHFR 227 0.022 → TS*SHMT 1091 N/S
  • 36. Pyrimidine synthesis SHMT*CBS SHMT*Fol SHMT*TS
  • 37. Comparison of BMA results to stepwise regresssion Interaction Purine synthesis BF MLE p-value → MTCH*MS 1130 0.008 → MTCH*PGT 1416 0.026 → PGT*CBS 1022 0.069 → PGT*MS 2851 0.007 → SHMT*Fol 1398 0.022 SHMT*MAT-II 646 0.012 TS*MTHFR 57 0.024
  • 38. Purine synthesis MTCH*MS MTCH*PGT PGT*CBS PGT*MS SHMT*Fol
  • 39. Comparison of BMA results to stepwise regresssion Interaction Homocysteine BF MLE p-value CBS*MAT-II 77 0.045 → CBS*Met 1072 N/S FTD*MAT-II 38 0.045 FTD*MTHFR 213 0.015 → MS*Met 1129 N/S MTCH*MS 978 0.006 PGT*MS 75 0.044 TS*MTHFR 41 0.022
  • 40. Homocysteine levels CBS*Met MS*Met
  • 41. Summary of folate pathway simulation Pathway knowledge can inform model search Simulated three plausible disease mechanisms Effect of causal metabolite on disease revealed in corresponding element of π Revealed plausible interactions not found through a stepwise regression
  • 42. Outline 1. Motivation 2. The algorithm: Incorporating biological priors into an MCMC sampler 3. Simulation 1: Performance of the method 4. Simulation 2: Detecting interactions in a known pathway 5. Application to data from a GWAS 6. Future Extensions
  • 43. Using gene annotations to inform a search for interactions Proof of concept: GWAS of breast cancer Publicly data from NCI (https://caintegrator.nci.nih.gov/cgems/) 1,145 cases and 1,142 controls of European ancestry The 22 Gene Ontology terms from Biological Process used to define priors in A and Z Included 6,078 SNPs, where each SNP had GO annotation and had lowest p-value in gene
  • 44. Top 10 interactions found Interaction Non-inf prior inf prior β(SE) BF β(SE) BF PARK2*SORCS1 0.22(0.06) 1e 4 0.27(0.06) 5e 4 AK5*ARHGAP26 0.16(0.05) 427 0.17(0.05) 903 FGFR2*MAML2 -0.11(0.04) 1 -0.16(0.05) 686 SHC3*KIF13B N/A N/A 0.17(0.05) 621 PCLO*ME3 N/A N/A 0.18(0.05) 528 CNGA3*CNN1 -0.16(0.05) 41 -0.17(0.05) 462 FGFR2*CDT1 N/A N/A -0.16(0.05) 445 SHC3*CXCL16 N/A N/A -0.18(0.05) 403 FGFR2*ABCA1 -0.1(0.05) 158 -0.11(0.05) 268 CYP2J2*SORCS1 -0.11(0.05) 74 -0.14(0.05) 266 FGFR2*SCG5 N/A N/A 0.21(0.05) 235
  • 45. Enrichment analysis Are the top interactions (BF > 100) enriched for certain GO terms? Compute empiric p-value for enrichment For each permute within bins representative of non-independence in observed interactions Pool bins, compute frequency of a GO term in the pool pvalue: Number of iterations freq exceeded obs freq divided by 1 million biological regulation (p=.008), growth (p=1e −6 ), metabolic process (p=.008), and regulation of biological process (p=.003).
  • 46. Outline 1. Motivation 2. The algorithm: Incorporating biological priors into an MCMC sampler 3. Simulation 1: Performance of the method 4. Simulation 2: Detecting interactions in a known pathway 5. Application to data from a GWAS 6. Future Extensions
  • 47. Incorporate gene-expression data into GWAS analyses Developing priors Should be more informative (e.g. empirical) and granular (e.g. SNP level) than GO Obtain genotype-expression paired data: HapMap? Apply WGCNA to infer pathway modules Genotype-module correlations used in Z matrix Incorporate more advanced MCMC techniques Evolutionary Monte Carlo Multiply-try Metropolis Brute-force search for MAP. Use MAP for initial values?
  • 48. Acknowledgements James Baurley David Conti Angela Presson (thanks in advance!) Funding: R01 ES016813 and R01 ES015090.