SlideShare una empresa de Scribd logo
1 de 32
Exploratory analysis  of phenotyping screens:  enrichment, clustering, ranking Network Biology - lecture 2
High-dimensional phenotypes by microscopy or molecular profiling Low-dimensional phenotypes A- Time Size
A  challenge  for computation and statistics
Today’s lecture ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
High-throughput phenotyping Weak Strong Phenotypes of 100s to 10.000s  perturbed genes Hits
Gene Ontology (GO) www.geneontology.org A GO Term with  a gene set  annotated to it
GO over-representation Hyper-geometric test Hits Weak Strong Phenotype GO All genes All Hits Hits in GO term Genes in GO term
Hyper-geometric distribution N  genes altogether n  hits k  hits in GO term m  genes in GO term Probability to observe k  hits in GO term Number of possibilities to choose  n  hits out of  N  genes k  hits fall into the GO term of size  m The other  n-k  hits are all genes outside the GO term
Hyper-geometric test: example pvalue = phyper (  k   ,  m   ,  N-m   ,  n   ,  lower.tail = FALSE ) Hold these fixed : N = 10 000 m = 200 See what happens if we vary: k = 1,2,…, 30 n = 50, 100, 200, 300, 400 k  or more! N n k m Number  k  of hits in GO term p-value [log10] 50 100 200 300 400
Gene Set Enrichment Analysis Not restricted to GO , could be any collection of gene sets,  e.g. MSigDB at http://www.broad.mit.edu/gsea/ Subramanian et al. (2005)   Weak Strong Phenotype GO Weak Strong Phenotype No significant trend Significant   trend
 
GSEA: construction Weak Strong Phenotype Nr.  non- hits before  i Nr. all  non- hits i Nr. hits before  i Nr.  all  hits if  p = 0
GSEA: examples
PROs and CONs Gene set 1 Gene set 2 Gene set 3 Gene set 4 … e.g.  Wnt pathway DNA repair Chromosome 1 Expressed in liver p -values  (hyper-geometric or GSEA) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Result:
Map phenotypes to network Where  do the hits fall in the network and  what  are they connected to?
Sub-networks rich in hits
Sub-networks  with highly correlated phenotypes
Predicting phenotypes Guilt by association Use known  phenotypes  in the network Use  edge weights  if possible Success depends on  quality and coverage  of linkage in the network ? ? ?
Which networks? Networks from large-scale experiments Networks from analyzing the experimental literature Networks from probabilistic data integration
Cluster first, think later Perturbed Genes Phenotypic profiles Features = Expression profiles, parameters of cell shape,  protein concentration or localization Genes with similar phenotypic profiles often have similar molecular functions or act in the same pathways. Principle for function prediction: Guilt by association
From data to distances Phenotypic Profiles Perturbed genes D [i,j] = dist(  M [i,] ,  M [j,] ) M D D [j,i] =  D [i,j] D [i,i] = 0  for all i dist(. , .) What distance measure should we use? Distance or dissimilarity matrix i j i j j i
Examples of distances how is this related to correlation? a b Euclidean distance dist( a,b ) = a b dist( a,b ) = Manhattan distance  a b dist( a,b ) = Cosine distance
Linkage: distances to clusters dist(C1, C2) =  max  { dist( i , j ) :  i  in C1,  j  in C2 } complete linkage =  min   { dist( i , j ) :  i  in C1,  j  in C2 } single linkage =  mean  { dist( i , j ) :  i  in C1,  j  in C2 } average linkage D Phenotype 2 Phenotype 1 1 2 3 6 5 4 D[3,4] 1 2 3 6 5 4 Phenotype 2 Phenotype 1 D[ (2,3) ,4] = ??? Distances between  individual genes 3
Hierarchical clustering Ingredients: data matrix , distance function, linkage function Phenotype 2 Phenotype 1 1 2 3 6 5 4 1 2 3 6 5 4 Dendrogram
Phenotypic Data 332 knock-downs in 5 conditions Correlation  matrix Dendrogram Data by  Klaas Mulder, CRI
Clustering: PROs and CONs ,[object Object],[object Object],[object Object],PRO NEG ,[object Object],Brown et al (2005) Ivanova et al (2006) Bakal et al (2007)
Query-based gene ranking ,[object Object],Often addressed by clustering: Which cluster does the query gene fall into? ,[object Object],[object Object],[object Object],[object Object],[object Object],Cluster 1 Cluster 2 Cluster 3 Query   gene
Ranking by PhenoBLAST ,[object Object],[object Object],PhenoBLAST Gunsalus et al (2004) Perturbed Genes Ordered  by similarity to query gene Phenotypes
Summary ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Three take-home messages ,[object Object],[object Object],[object Object]
From clusters to pathways ,[object Object],Next lectures:   graph-based and probabilistic models to infer (causal) pathway structure from phenotypic data.
Network Biology - lecture 2 ≥ 3  Questions !

Más contenido relacionado

La actualidad más candente

Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithms
nextlib
 
OMICS.pptx
OMICS.pptxOMICS.pptx
OMICS.pptx
PagudalaSangeetha
 

La actualidad más candente (20)

Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22
Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22
Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22
 
Learn to Rank search results
Learn to Rank search resultsLearn to Rank search results
Learn to Rank search results
 
Kogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysisKogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysis
 
Marketplace in motion - AdKDD keynote - 2020
Marketplace in motion - AdKDD keynote - 2020 Marketplace in motion - AdKDD keynote - 2020
Marketplace in motion - AdKDD keynote - 2020
 
Whole Genome Analysis
Whole Genome AnalysisWhole Genome Analysis
Whole Genome Analysis
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
 
K- Nearest Neighbor Approach
K- Nearest Neighbor ApproachK- Nearest Neighbor Approach
K- Nearest Neighbor Approach
 
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filtering
 
Deep neural network for youtube recommendations
Deep neural network for youtube recommendationsDeep neural network for youtube recommendations
Deep neural network for youtube recommendations
 
Genome Assembly 2018
Genome Assembly 2018Genome Assembly 2018
Genome Assembly 2018
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithms
 
OMICS.pptx
OMICS.pptxOMICS.pptx
OMICS.pptx
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Tutorial on User Profiling with Graph Neural Networks and Related Beyond-Acc...
Tutorial on User Profiling with Graph Neural Networks  and Related Beyond-Acc...Tutorial on User Profiling with Graph Neural Networks  and Related Beyond-Acc...
Tutorial on User Profiling with Graph Neural Networks and Related Beyond-Acc...
 
Novozymes Enzyme Stability Prediction
Novozymes Enzyme Stability PredictionNovozymes Enzyme Stability Prediction
Novozymes Enzyme Stability Prediction
 
Iris data analysis example in R
Iris data analysis example in RIris data analysis example in R
Iris data analysis example in R
 
PHYLOGENETICS WITH MEGA
PHYLOGENETICS WITH MEGAPHYLOGENETICS WITH MEGA
PHYLOGENETICS WITH MEGA
 
Churn prediction data modeling
Churn prediction data modelingChurn prediction data modeling
Churn prediction data modeling
 

Similar a Network Biology Lent 2010 - lecture 1

probabilistic ranking
probabilistic rankingprobabilistic ranking
probabilistic ranking
FELIX75
 
BOSE, Debasish - Research Plan
BOSE, Debasish - Research PlanBOSE, Debasish - Research Plan
BOSE, Debasish - Research Plan
Debasish Bose
 

Similar a Network Biology Lent 2010 - lecture 1 (20)

Kishor Presentation
Kishor PresentationKishor Presentation
Kishor Presentation
 
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationVisual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
 
Proteomics - Analysis and integration of large-scale data sets
Proteomics - Analysis and integration of large-scale data setsProteomics - Analysis and integration of large-scale data sets
Proteomics - Analysis and integration of large-scale data sets
 
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
 
Prediction of protein function
Prediction of protein functionPrediction of protein function
Prediction of protein function
 
STRING: Prediction of protein networks through integration of diverse large-s...
STRING: Prediction of protein networks through integration of diverse large-s...STRING: Prediction of protein networks through integration of diverse large-s...
STRING: Prediction of protein networks through integration of diverse large-s...
 
PhD midterm report
PhD midterm reportPhD midterm report
PhD midterm report
 
presentation
presentationpresentation
presentation
 
probabilistic ranking
probabilistic rankingprobabilistic ranking
probabilistic ranking
 
Omic Data Integration Strategies
Omic Data Integration StrategiesOmic Data Integration Strategies
Omic Data Integration Strategies
 
BOSE, Debasish - Research Plan
BOSE, Debasish - Research PlanBOSE, Debasish - Research Plan
BOSE, Debasish - Research Plan
 
COMPARATIVE GENOMICS.ppt
COMPARATIVE GENOMICS.pptCOMPARATIVE GENOMICS.ppt
COMPARATIVE GENOMICS.ppt
 
Kulakova sbb2014
Kulakova sbb2014Kulakova sbb2014
Kulakova sbb2014
 
Bioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmmBioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmm
 
Gene expression profiling i
Gene expression profiling  iGene expression profiling  i
Gene expression profiling i
 
Paper presentation @DILS'07
Paper presentation @DILS'07Paper presentation @DILS'07
Paper presentation @DILS'07
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and Visualization
 
Cornell Pbsb 20090126 Nets
Cornell Pbsb 20090126 NetsCornell Pbsb 20090126 Nets
Cornell Pbsb 20090126 Nets
 
Softwares For Phylogentic Analysis
Softwares For Phylogentic AnalysisSoftwares For Phylogentic Analysis
Softwares For Phylogentic Analysis
 
Making Protein Function and Subcellular Localization Predictions: Challenges ...
Making Protein Function and Subcellular Localization Predictions: Challenges ...Making Protein Function and Subcellular Localization Predictions: Challenges ...
Making Protein Function and Subcellular Localization Predictions: Challenges ...
 

Network Biology Lent 2010 - lecture 1

  • 1. Exploratory analysis of phenotyping screens: enrichment, clustering, ranking Network Biology - lecture 2
  • 2. High-dimensional phenotypes by microscopy or molecular profiling Low-dimensional phenotypes A- Time Size
  • 3. A challenge for computation and statistics
  • 4.
  • 5. High-throughput phenotyping Weak Strong Phenotypes of 100s to 10.000s perturbed genes Hits
  • 6. Gene Ontology (GO) www.geneontology.org A GO Term with a gene set annotated to it
  • 7. GO over-representation Hyper-geometric test Hits Weak Strong Phenotype GO All genes All Hits Hits in GO term Genes in GO term
  • 8. Hyper-geometric distribution N genes altogether n hits k hits in GO term m genes in GO term Probability to observe k hits in GO term Number of possibilities to choose n hits out of N genes k hits fall into the GO term of size m The other n-k hits are all genes outside the GO term
  • 9. Hyper-geometric test: example pvalue = phyper ( k , m , N-m , n , lower.tail = FALSE ) Hold these fixed : N = 10 000 m = 200 See what happens if we vary: k = 1,2,…, 30 n = 50, 100, 200, 300, 400 k or more! N n k m Number k of hits in GO term p-value [log10] 50 100 200 300 400
  • 10. Gene Set Enrichment Analysis Not restricted to GO , could be any collection of gene sets, e.g. MSigDB at http://www.broad.mit.edu/gsea/ Subramanian et al. (2005) Weak Strong Phenotype GO Weak Strong Phenotype No significant trend Significant trend
  • 11.  
  • 12. GSEA: construction Weak Strong Phenotype Nr. non- hits before i Nr. all non- hits i Nr. hits before i Nr. all hits if p = 0
  • 14.
  • 15. Map phenotypes to network Where do the hits fall in the network and what are they connected to?
  • 17. Sub-networks with highly correlated phenotypes
  • 18. Predicting phenotypes Guilt by association Use known phenotypes in the network Use edge weights if possible Success depends on quality and coverage of linkage in the network ? ? ?
  • 19. Which networks? Networks from large-scale experiments Networks from analyzing the experimental literature Networks from probabilistic data integration
  • 20. Cluster first, think later Perturbed Genes Phenotypic profiles Features = Expression profiles, parameters of cell shape, protein concentration or localization Genes with similar phenotypic profiles often have similar molecular functions or act in the same pathways. Principle for function prediction: Guilt by association
  • 21. From data to distances Phenotypic Profiles Perturbed genes D [i,j] = dist( M [i,] , M [j,] ) M D D [j,i] = D [i,j] D [i,i] = 0 for all i dist(. , .) What distance measure should we use? Distance or dissimilarity matrix i j i j j i
  • 22. Examples of distances how is this related to correlation? a b Euclidean distance dist( a,b ) = a b dist( a,b ) = Manhattan distance  a b dist( a,b ) = Cosine distance
  • 23. Linkage: distances to clusters dist(C1, C2) = max { dist( i , j ) : i in C1, j in C2 } complete linkage = min { dist( i , j ) : i in C1, j in C2 } single linkage = mean { dist( i , j ) : i in C1, j in C2 } average linkage D Phenotype 2 Phenotype 1 1 2 3 6 5 4 D[3,4] 1 2 3 6 5 4 Phenotype 2 Phenotype 1 D[ (2,3) ,4] = ??? Distances between individual genes 3
  • 24. Hierarchical clustering Ingredients: data matrix , distance function, linkage function Phenotype 2 Phenotype 1 1 2 3 6 5 4 1 2 3 6 5 4 Dendrogram
  • 25. Phenotypic Data 332 knock-downs in 5 conditions Correlation matrix Dendrogram Data by Klaas Mulder, CRI
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32. Network Biology - lecture 2 ≥ 3 Questions !