SlideShare una empresa de Scribd logo
1 de 33
Descargar para leer sin conexión
Probabilistic refinement of
               cellular pathway models
                         Cambridge Statistical Laboratory
                                Networks seminar series
                                             2009 Jan 21


Florian Markowetz
florian.markowetz@cancer.org.uk
What is a signaling pathway?

            Environmental
            stimuli


                                           Protein
 Receptor in
 cell membrane


                            Pat
                                hw
                                           mRNA
 Protein cascade
                               ay
 Transcription factors
 regulating target genes             DNA
Pathway reconstruction
Signaling pathways are important
- Deregulation causes many diseases incl. cancer
Signaling pathways are poorly understood
- Only parts-lists
- missing are interactions within and between pathways
Biological research
- So far mostly focused on individual genes
New genome-scale datasets
- Opportunity for data integration and novel methods
What data do we have?
            Proteins:
            - interactions between proteins
                                                    Bulk of data:
            - binding to DNA
                                                     Microarray

                               mRNA:
          Protein
                               - Expression under
                               different stimuli
                               - binding to DNA
          mRNA
                                    Sequence:
                                    - binding motifs
                                    - epigenetic marks
    DNA
                               Morphology
Pathways as graphs
   • Nodes are (mostly) known
   • Goal: infer edges from data
   • Data are heterogeneous
                     • co-expression between
      Edges          genes
                     • interactions between
                     proteins
                     • binding motifs at genes
                     • binding of proteins to
      Nodes          • Protein domains
                     DNA
                     • Functional annotation
               • Cause-effect data:
      Paths    • changing environments
               • experimental perturbations
Pathway reconstruction
“Classical” statistical approaches:
Treat the genes/proteins as random variables and
   explore correlation structure in the data:
   – Correlation graphs
   – Gaussian graphical models (partial correlation)
   – Bayesian networks

Challenges/Problems/Opportunities
1. Correlation may be un-informative
2. Integrate heterogeneous and noisy and
                      complementary data sources
                               Review: Markowetz and Spang (2007)
– Part 1 –

Nested Effects Models
Experimental perturbations
                                      Drugs
 Small
 molecules
                                              RNAi
                            Protein
 Stress

                                              Knockout
                            mRNA




                     DNA


Readout:
Global gene expression measurements
Drosophila immune response
Columns: perturbed genes
Rows: effects on other genes

1. Silencing tak1 reduces
   expression of all LPS-
   inducible transcripts
2. Silencing rel (key) or
   mkk4/hep reduces
   expression of subsets of
   induced transcripts

(Boutros et al, Dev Cell 2002)
(!) Two types of entities

  Components of signaling
    pathway which are
    experimentally
    perturbed



  Downstream effect
    reporters
(!!) Only indirect information

No direct observation of
 perturbation effects on
 other pathway
 components!


Inference from observed
  perturbation effects on
  downstream reporters.
The information gap

Direct information:            Indirect information:
effects are visible at other   effects are only visible at
pathway components             down-stream reporters
Pathway                        Pathway
          B                          B
                    D                                D
                                 A               C
  A           C

                                   - Cell survival or death
                                   - Growth rate
                                   - downstream genes
Correlation won’t do
                       “Classical” approach
Pathway                             Correlation
      B            D                Graphical models:
                                    - Bayes Nets
  A           C                     - GGMs
                                    Mutual Information


                                      Nested
      Downstream
                                      Effects
       regulated
         genes
                                      Models
Nested Effects Models
                             1. Set of candidate pathway genes
INPUT
                             2. High-dimensional phenotypic profile, e.g. microarray

       Graph representation of information flow explaining
OUTPUT
       the phenotypes
                                 Phenotypic profiles      Inferred pathway
        Gene perturbations




                             A
                                                                    AB
                             B
                             C
                             D                                               EF
                                                            CD
                             E
                             F
                             G                                      GH
                             H

                                         Effects
NEM: model formulation
M’xyz:                                         Expected                Observed
                              Z
          X         Y                  X                       X           FN   FN
                                       Y                       Y      FP
                                       Z                       Z                     FN
     E1   E2   E3       E4   E5   E6       E1 E2 E3 E4 E5 E6       E1 E2 E3 E4 E5 E6


Pathway genes: X, Y, Z                     Effect reporters: E1, …, E6
• core topology                            • states are observed
• to be reconstructed                           = Data D
    = Model M                              • positions in pathway unknown
                                               = Parameters θ
                                              Marginal likelihood
Posterior: P ( M | D ) = 1/Z . P( D | M ) . P( M )
Likelihood P( D | M, θ )

              Compare predictions with observations:
     Y
                    Prediction             E1=0       E2=1
X        Z
                    Observation         1. E1=1       E2=1
                                        2. E1=0       E2=1
E1       E2

Error probabilities
        e.g. false NEG rate 20%, false POS rate 5%
 Lik = Pr( E1 = 1) ⋅ Pr( E2 = 1) ⋅ Pr( E1 = 0) ⋅ Pr( E2 = 1)
     = 0.05 ⋅ 0.95 ⋅ 0.80 ⋅ 0.95
Marginal likelihood

 P ( D | M ) = ∫ P ( D | M , Θ ) P (Θ | M ) dΘ
                      m         l
                           n
               1
                    ∏∑∏ P(e                 | M ,θ i = j )
             =m                        ik
              n      i =1 j =1 k =1
Uniform
prior over
positions
                                             Distribution of
                                             single effect
Product over
                                Product over reporter with
all effect   Average over
             possible positions replicate    known position
reporters
                                observation
             in the pathway
NEM: inference
Model space: all transitively closed directed graphs
Exhaustive enumeration: score all models to find
  the one fitting the data best
               Markowetz et al. Bioinformatics, 2005
MCMC, Simulated Annealing: take small
 probabilistic steps to explore model space
                    . . . with A Tresch; in preparation
Divide and conquer: break a big model into smaller,
  manageable pieces and then re-assemble
                       Markowetz et al. ISMB 2007
NEM: extensions




                               Likelihood based on
Drop transitivity
  requirement                  log-ratios of effects


                    Feature selection to concentrate on
                    informative effect reporters



                           Tresch and Markowetz (2008)
NEMs on Drosophila data
Summary of part 1

1. Gene perturbation screens with gene-
   expression readouts
2. Perturbation screens suffer from the
   information gap between pathways and
   reporters
3. Nested Effects Models reconstruct pathway
   features from subset relations between
   observed effects
– Part 2 –

      Data integration and
   probabilistic refinement of
a signaling pathway hypothesis
Pathway refinement
     1. Start from given pathway hypothesis
      Even if our understanding of pathways is poor, that does
                                not mean we have none at all!
     2. Evaluate evidence for hypothesis in
        data
     3. Identify weakly supported areas and
        likely extensions
     Not reconstruction from scratch.
     Step 1: assemble pathway hypothesis
        (KEGG, literature, …) for pheromone
        response pathway in Yeast
Edge data I
              Support for hypothesis in
      protein-protein interaction data
Edge data II
          Support for hypothesis in
              co-expression data
Edge data III
   Why is it so hard to reconstruct
   nuclear regulatory network from
   correlations?
Edge data IV
               Support for hypothesis in
                 TF-DNA binding data
Paths: cause-effect data
         Expression profiling of knock-out mutants
                              (Hughes et al., 2000)




              Result:
              transcriptional response to perturbation
              only visible on down-stream genes
              (information gap!)
Conclusion from data analysis

• Every data source is informative for a specific
  compartment of the pathway
• No data source is informative in all
  compartments
• We expect these observations also to hold for
  other MAPK and signaling pathways.

Need compartment-specific integrative model
 encompassing edge, node, and path data.
Integrative model
                                Conditional distributions
                                for each data type
   Pathway graph as
   hidden/latent
   variables



        Prior                                Parameters




Graphical model defines
                                Different data types contribute
posterior P(G|data)
                                to each compartment
-> inference by Gibbs sampler
Evaluation

1. Fit model parameters on pheromone
   response pathway (training)
2. Use fitted model on other MAPK pathways
   (generalization to closely related examples)
3. Use fitted model on all other Yeast signaling
   pathways (generalization to everything else)

            … work in progress …
Acknowledgements
Nested Effects Models
Rainer Spang (Univ. Regensburg) .:. Dennis
 Kostka (UC SF) .:. Achim Tresch (Gene Center
 Munich) .:. Holger Fröhlich (DKFZ Heidelberg)
 .:. Tim Beißbarth (Univ. Göttingen) .:. Josh
 Stuart, Charlie Vaske (UC SC) .:.
Data integration
Olga G. Troyanskaya (Princeton) .:. Edoardo
 Airoldi (Harvard) .:. David Blei (Princeton) .:.
Probabilistic refinement of
            cellular pathway models


        Thank you !
Florian Markowetz
florian.markowetz@cancer.org.uk

Más contenido relacionado

La actualidad más candente (10)

P bluescript
P bluescriptP bluescript
P bluescript
 
P 53 Tumour Biology
P 53 Tumour BiologyP 53 Tumour Biology
P 53 Tumour Biology
 
P uc vectors
P uc vectorsP uc vectors
P uc vectors
 
Bacteriophage based vector
Bacteriophage based vectorBacteriophage based vector
Bacteriophage based vector
 
Natalia Cucu Simp 09
Natalia Cucu Simp 09Natalia Cucu Simp 09
Natalia Cucu Simp 09
 
The Molecular Genetics Of Immunoglobulins
The Molecular Genetics Of ImmunoglobulinsThe Molecular Genetics Of Immunoglobulins
The Molecular Genetics Of Immunoglobulins
 
Lecture on pUC18 vector
Lecture on pUC18 vectorLecture on pUC18 vector
Lecture on pUC18 vector
 
Derivatives of pBR322
Derivatives of pBR322Derivatives of pBR322
Derivatives of pBR322
 
pUC18 vector
pUC18 vector pUC18 vector
pUC18 vector
 
P br322
P br322P br322
P br322
 

Similar a Probabilistic refinement of cellular pathway models

Multi-scale network biology model & the model library
Multi-scale network biology model & the model libraryMulti-scale network biology model & the model library
Multi-scale network biology model & the model librarylaserxiong
 
Lab Gene Expression Data Analysis
Lab Gene Expression Data AnalysisLab Gene Expression Data Analysis
Lab Gene Expression Data AnalysisUSD Bioinformatics
 
NetBioSIG2012 joshstuart
NetBioSIG2012 joshstuartNetBioSIG2012 joshstuart
NetBioSIG2012 joshstuartAlexander Pico
 
Identification of pathological mutations from the single-gene case to exome p...
Identification of pathological mutations from the single-gene case to exome p...Identification of pathological mutations from the single-gene case to exome p...
Identification of pathological mutations from the single-gene case to exome p...Vall d'Hebron Institute of Research (VHIR)
 
Exploring the neuroblastoma epigenome: perspectives for improved prognosis
Exploring the neuroblastoma epigenome: perspectives for improved prognosisExploring the neuroblastoma epigenome: perspectives for improved prognosis
Exploring the neuroblastoma epigenome: perspectives for improved prognosisMaté Ongenaert
 
Stephen Friend NIH PPP Coordinating Committee Meeting 2012-02-16
Stephen Friend NIH PPP Coordinating Committee Meeting 2012-02-16Stephen Friend NIH PPP Coordinating Committee Meeting 2012-02-16
Stephen Friend NIH PPP Coordinating Committee Meeting 2012-02-16Sage Base
 
Software for SBML Today
Software for SBML TodaySoftware for SBML Today
Software for SBML TodayMike Hucka
 
Novel network pharmacology methods for drug mechanism of action identificatio...
Novel network pharmacology methods for drug mechanism of action identificatio...Novel network pharmacology methods for drug mechanism of action identificatio...
Novel network pharmacology methods for drug mechanism of action identificatio...laserxiong
 
Stephen Friend AMIA Symposium 2012-03-21
Stephen Friend AMIA Symposium 2012-03-21Stephen Friend AMIA Symposium 2012-03-21
Stephen Friend AMIA Symposium 2012-03-21Sage Base
 
Friend NIGM 2012-05-23
Friend NIGM 2012-05-23Friend NIGM 2012-05-23
Friend NIGM 2012-05-23Sage Base
 
Next Generation Sequencing for Joubert Poster
Next Generation Sequencing for Joubert PosterNext Generation Sequencing for Joubert Poster
Next Generation Sequencing for Joubert Postersptaylor
 
Next Generation Sequencing for Joubert Syndrome
Next Generation Sequencing for Joubert SyndromeNext Generation Sequencing for Joubert Syndrome
Next Generation Sequencing for Joubert Syndromesptaylor
 
Stephen Friend Fanconi Anemia Research Fund 2012-01-21
Stephen Friend Fanconi Anemia Research Fund 2012-01-21Stephen Friend Fanconi Anemia Research Fund 2012-01-21
Stephen Friend Fanconi Anemia Research Fund 2012-01-21Sage Base
 
Stephen Friend Food & Drug Administration 2011-07-18
Stephen Friend Food & Drug Administration 2011-07-18Stephen Friend Food & Drug Administration 2011-07-18
Stephen Friend Food & Drug Administration 2011-07-18Sage Base
 
Functional genomics, and tools
Functional genomics, and toolsFunctional genomics, and tools
Functional genomics, and toolsKAUSHAL SAHU
 
SBML (the Systems Biology Markup Language), model databases, and other resources
SBML (the Systems Biology Markup Language), model databases, and other resourcesSBML (the Systems Biology Markup Language), model databases, and other resources
SBML (the Systems Biology Markup Language), model databases, and other resourcesMike Hucka
 
Genomic selection in Livestock
Genomic  selection in LivestockGenomic  selection in Livestock
Genomic selection in LivestockILRI
 

Similar a Probabilistic refinement of cellular pathway models (20)

Pradeep.ii
Pradeep.iiPradeep.ii
Pradeep.ii
 
Multi-scale network biology model & the model library
Multi-scale network biology model & the model libraryMulti-scale network biology model & the model library
Multi-scale network biology model & the model library
 
Lab Gene Expression Data Analysis
Lab Gene Expression Data AnalysisLab Gene Expression Data Analysis
Lab Gene Expression Data Analysis
 
Biological Network Inference via Gaussian Graphical Models
Biological Network Inference via Gaussian Graphical ModelsBiological Network Inference via Gaussian Graphical Models
Biological Network Inference via Gaussian Graphical Models
 
NetBioSIG2012 joshstuart
NetBioSIG2012 joshstuartNetBioSIG2012 joshstuart
NetBioSIG2012 joshstuart
 
Identification of pathological mutations from the single-gene case to exome p...
Identification of pathological mutations from the single-gene case to exome p...Identification of pathological mutations from the single-gene case to exome p...
Identification of pathological mutations from the single-gene case to exome p...
 
Exploring the neuroblastoma epigenome: perspectives for improved prognosis
Exploring the neuroblastoma epigenome: perspectives for improved prognosisExploring the neuroblastoma epigenome: perspectives for improved prognosis
Exploring the neuroblastoma epigenome: perspectives for improved prognosis
 
Stephen Friend NIH PPP Coordinating Committee Meeting 2012-02-16
Stephen Friend NIH PPP Coordinating Committee Meeting 2012-02-16Stephen Friend NIH PPP Coordinating Committee Meeting 2012-02-16
Stephen Friend NIH PPP Coordinating Committee Meeting 2012-02-16
 
Software for SBML Today
Software for SBML TodaySoftware for SBML Today
Software for SBML Today
 
Novel network pharmacology methods for drug mechanism of action identificatio...
Novel network pharmacology methods for drug mechanism of action identificatio...Novel network pharmacology methods for drug mechanism of action identificatio...
Novel network pharmacology methods for drug mechanism of action identificatio...
 
Stephen Friend AMIA Symposium 2012-03-21
Stephen Friend AMIA Symposium 2012-03-21Stephen Friend AMIA Symposium 2012-03-21
Stephen Friend AMIA Symposium 2012-03-21
 
gene_concept_2.pdf
gene_concept_2.pdfgene_concept_2.pdf
gene_concept_2.pdf
 
Friend NIGM 2012-05-23
Friend NIGM 2012-05-23Friend NIGM 2012-05-23
Friend NIGM 2012-05-23
 
Next Generation Sequencing for Joubert Poster
Next Generation Sequencing for Joubert PosterNext Generation Sequencing for Joubert Poster
Next Generation Sequencing for Joubert Poster
 
Next Generation Sequencing for Joubert Syndrome
Next Generation Sequencing for Joubert SyndromeNext Generation Sequencing for Joubert Syndrome
Next Generation Sequencing for Joubert Syndrome
 
Stephen Friend Fanconi Anemia Research Fund 2012-01-21
Stephen Friend Fanconi Anemia Research Fund 2012-01-21Stephen Friend Fanconi Anemia Research Fund 2012-01-21
Stephen Friend Fanconi Anemia Research Fund 2012-01-21
 
Stephen Friend Food & Drug Administration 2011-07-18
Stephen Friend Food & Drug Administration 2011-07-18Stephen Friend Food & Drug Administration 2011-07-18
Stephen Friend Food & Drug Administration 2011-07-18
 
Functional genomics, and tools
Functional genomics, and toolsFunctional genomics, and tools
Functional genomics, and tools
 
SBML (the Systems Biology Markup Language), model databases, and other resources
SBML (the Systems Biology Markup Language), model databases, and other resourcesSBML (the Systems Biology Markup Language), model databases, and other resources
SBML (the Systems Biology Markup Language), model databases, and other resources
 
Genomic selection in Livestock
Genomic  selection in LivestockGenomic  selection in Livestock
Genomic selection in Livestock
 

Último

Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 

Último (20)

Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 

Probabilistic refinement of cellular pathway models

  • 1. Probabilistic refinement of cellular pathway models Cambridge Statistical Laboratory Networks seminar series 2009 Jan 21 Florian Markowetz florian.markowetz@cancer.org.uk
  • 2. What is a signaling pathway? Environmental stimuli Protein Receptor in cell membrane Pat hw mRNA Protein cascade ay Transcription factors regulating target genes DNA
  • 3. Pathway reconstruction Signaling pathways are important - Deregulation causes many diseases incl. cancer Signaling pathways are poorly understood - Only parts-lists - missing are interactions within and between pathways Biological research - So far mostly focused on individual genes New genome-scale datasets - Opportunity for data integration and novel methods
  • 4. What data do we have? Proteins: - interactions between proteins Bulk of data: - binding to DNA Microarray mRNA: Protein - Expression under different stimuli - binding to DNA mRNA Sequence: - binding motifs - epigenetic marks DNA Morphology
  • 5. Pathways as graphs • Nodes are (mostly) known • Goal: infer edges from data • Data are heterogeneous • co-expression between Edges genes • interactions between proteins • binding motifs at genes • binding of proteins to Nodes • Protein domains DNA • Functional annotation • Cause-effect data: Paths • changing environments • experimental perturbations
  • 6. Pathway reconstruction “Classical” statistical approaches: Treat the genes/proteins as random variables and explore correlation structure in the data: – Correlation graphs – Gaussian graphical models (partial correlation) – Bayesian networks Challenges/Problems/Opportunities 1. Correlation may be un-informative 2. Integrate heterogeneous and noisy and complementary data sources Review: Markowetz and Spang (2007)
  • 7. – Part 1 – Nested Effects Models
  • 8. Experimental perturbations Drugs Small molecules RNAi Protein Stress Knockout mRNA DNA Readout: Global gene expression measurements
  • 9. Drosophila immune response Columns: perturbed genes Rows: effects on other genes 1. Silencing tak1 reduces expression of all LPS- inducible transcripts 2. Silencing rel (key) or mkk4/hep reduces expression of subsets of induced transcripts (Boutros et al, Dev Cell 2002)
  • 10. (!) Two types of entities Components of signaling pathway which are experimentally perturbed Downstream effect reporters
  • 11. (!!) Only indirect information No direct observation of perturbation effects on other pathway components! Inference from observed perturbation effects on downstream reporters.
  • 12. The information gap Direct information: Indirect information: effects are visible at other effects are only visible at pathway components down-stream reporters Pathway Pathway B B D D A C A C - Cell survival or death - Growth rate - downstream genes
  • 13. Correlation won’t do “Classical” approach Pathway Correlation B D Graphical models: - Bayes Nets A C - GGMs Mutual Information Nested Downstream Effects regulated genes Models
  • 14. Nested Effects Models 1. Set of candidate pathway genes INPUT 2. High-dimensional phenotypic profile, e.g. microarray Graph representation of information flow explaining OUTPUT the phenotypes Phenotypic profiles Inferred pathway Gene perturbations A AB B C D EF CD E F G GH H Effects
  • 15. NEM: model formulation M’xyz: Expected Observed Z X Y X X FN FN Y Y FP Z Z FN E1 E2 E3 E4 E5 E6 E1 E2 E3 E4 E5 E6 E1 E2 E3 E4 E5 E6 Pathway genes: X, Y, Z Effect reporters: E1, …, E6 • core topology • states are observed • to be reconstructed = Data D = Model M • positions in pathway unknown = Parameters θ Marginal likelihood Posterior: P ( M | D ) = 1/Z . P( D | M ) . P( M )
  • 16. Likelihood P( D | M, θ ) Compare predictions with observations: Y Prediction E1=0 E2=1 X Z Observation 1. E1=1 E2=1 2. E1=0 E2=1 E1 E2 Error probabilities e.g. false NEG rate 20%, false POS rate 5% Lik = Pr( E1 = 1) ⋅ Pr( E2 = 1) ⋅ Pr( E1 = 0) ⋅ Pr( E2 = 1) = 0.05 ⋅ 0.95 ⋅ 0.80 ⋅ 0.95
  • 17. Marginal likelihood P ( D | M ) = ∫ P ( D | M , Θ ) P (Θ | M ) dΘ m l n 1 ∏∑∏ P(e | M ,θ i = j ) =m ik n i =1 j =1 k =1 Uniform prior over positions Distribution of single effect Product over Product over reporter with all effect Average over possible positions replicate known position reporters observation in the pathway
  • 18. NEM: inference Model space: all transitively closed directed graphs Exhaustive enumeration: score all models to find the one fitting the data best Markowetz et al. Bioinformatics, 2005 MCMC, Simulated Annealing: take small probabilistic steps to explore model space . . . with A Tresch; in preparation Divide and conquer: break a big model into smaller, manageable pieces and then re-assemble Markowetz et al. ISMB 2007
  • 19. NEM: extensions Likelihood based on Drop transitivity requirement log-ratios of effects Feature selection to concentrate on informative effect reporters Tresch and Markowetz (2008)
  • 21. Summary of part 1 1. Gene perturbation screens with gene- expression readouts 2. Perturbation screens suffer from the information gap between pathways and reporters 3. Nested Effects Models reconstruct pathway features from subset relations between observed effects
  • 22. – Part 2 – Data integration and probabilistic refinement of a signaling pathway hypothesis
  • 23. Pathway refinement 1. Start from given pathway hypothesis Even if our understanding of pathways is poor, that does not mean we have none at all! 2. Evaluate evidence for hypothesis in data 3. Identify weakly supported areas and likely extensions Not reconstruction from scratch. Step 1: assemble pathway hypothesis (KEGG, literature, …) for pheromone response pathway in Yeast
  • 24. Edge data I Support for hypothesis in protein-protein interaction data
  • 25. Edge data II Support for hypothesis in co-expression data
  • 26. Edge data III Why is it so hard to reconstruct nuclear regulatory network from correlations?
  • 27. Edge data IV Support for hypothesis in TF-DNA binding data
  • 28. Paths: cause-effect data Expression profiling of knock-out mutants (Hughes et al., 2000) Result: transcriptional response to perturbation only visible on down-stream genes (information gap!)
  • 29. Conclusion from data analysis • Every data source is informative for a specific compartment of the pathway • No data source is informative in all compartments • We expect these observations also to hold for other MAPK and signaling pathways. Need compartment-specific integrative model encompassing edge, node, and path data.
  • 30. Integrative model Conditional distributions for each data type Pathway graph as hidden/latent variables Prior Parameters Graphical model defines Different data types contribute posterior P(G|data) to each compartment -> inference by Gibbs sampler
  • 31. Evaluation 1. Fit model parameters on pheromone response pathway (training) 2. Use fitted model on other MAPK pathways (generalization to closely related examples) 3. Use fitted model on all other Yeast signaling pathways (generalization to everything else) … work in progress …
  • 32. Acknowledgements Nested Effects Models Rainer Spang (Univ. Regensburg) .:. Dennis Kostka (UC SF) .:. Achim Tresch (Gene Center Munich) .:. Holger Fröhlich (DKFZ Heidelberg) .:. Tim Beißbarth (Univ. Göttingen) .:. Josh Stuart, Charlie Vaske (UC SC) .:. Data integration Olga G. Troyanskaya (Princeton) .:. Edoardo Airoldi (Harvard) .:. David Blei (Princeton) .:.
  • 33. Probabilistic refinement of cellular pathway models Thank you ! Florian Markowetz florian.markowetz@cancer.org.uk