Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Cargando en…3
×

Eche un vistazo a continuación

1 de 52 Anuncio
Anuncio

Más Contenido Relacionado

Presentaciones para usted (19)

Anuncio

Similares a PhDc exam presentation (20)

Más de Carlos Manuel Estévez-Bretón Riveros (14)

Anuncio

Más reciente (20)

PhDc exam presentation

  1. 1. Functional Characterisation of Metabolic Networks Carlos Manuel Estévez-Bretón MSc Doctorate in Systems Engineering and Computer Sciences Advisors: Luis Fernando Niño PhD Liliana Lopez Kleine PhD Intelligent Systems Research Laboratory - LISI Bioinformatics and Computational Biology research line “BioLisi” Examining Committee: Dr. Jason Papin, -U. ofVirginia, Bioengineering. Dr.Andres Gonzalez, - U. de los Andes, Chemical Engineering. Dr. Fabio Gonzalez, U. Nacional, Systems Engineering.
  2. 2. What... Why... Research Question How... Progress ... Agenda Goals Evaluation Deliverables
  3. 3. What? http://www.impactcommunicationsinc.com/wp-content/uploads/2011/10/11-11_speak_up.jpg
  4. 4. Metabolism are the complete set of metabolic networks and physical processes that determine the physiological and biochemical properties of a cell. With the sequencing of complete genomes, it is now possible to reconstruct the network of biochemical reactions in many organisms, from bacteria to humans...
  5. 5. PMC 2011 August 17. Wiley Interdiscip Rev Syst Biol Med. 2010 Jul-Aug; 2(4): 438–459. doi: 10.1002/wsbm.75 Ecological Scale Lucas B. Edelman, James A. Eddy, and Nathan D. Price Systems BiologyIntroduction
  6. 6. PMC 2011 August 17. Wiley Interdiscip Rev Syst Biol Med. 2010 Jul-Aug; 2(4): 438–459. doi: 10.1002/wsbm.75 Ecological Scale Lucas B. Edelman, James A. Eddy, and Nathan D. Price Systems BiologyIntroduction
  7. 7. PMC 2011 August 17. Wiley Interdiscip Rev Syst Biol Med. 2010 Jul-Aug; 2(4): 438–459. doi: 10.1002/wsbm.75 Ecological Scale Lucas B. Edelman, James A. Eddy, and Nathan D. Price Systems BiologyIntroduction
  8. 8. PMC 2011 August 17. Wiley Interdiscip Rev Syst Biol Med. 2010 Jul-Aug; 2(4): 438–459. doi: 10.1002/wsbm.75 Ecological Scale Lucas B. Edelman, James A. Eddy, and Nathan D. Price Systems BiologyIntroduction
  9. 9. PMC 2011 August 17. Wiley Interdiscip Rev Syst Biol Med. 2010 Jul-Aug; 2(4): 438–459. doi: 10.1002/wsbm.75 Ecological Scale Lucas B. Edelman, James A. Eddy, and Nathan D. Price Multilevelfield Systems BiologyIntroduction
  10. 10. PMC 2011 August 17. Wiley Interdiscip Rev Syst Biol Med. 2010 Jul-Aug; 2(4): 438–459. doi: 10.1002/wsbm.75 Ecological Scale Lucas B. Edelman, James A. Eddy, and Nathan D. Price Multilevelfield Studied Interdisciplinary Systems BiologyIntroduction
  11. 11. IntroductionBetter and cheaper processing power
  12. 12. Multilevel Information IntroductionBetter and cheaper processing power
  13. 13. Introduction Regulatory Networks Protein Protein Interaction Networks Metabolic Networks Ecological Networks
  14. 14. Introduction Regulatory Networks Protein Protein Interaction Networks Metabolic Networks Ecological Networks Main Data Sources
  15. 15. “Techniques such as high-trougput (HT) sequencing and gene/protein profiling have transformed biological Research” (Khatri et al,2012) “In this way,the advent of HT profiling technologies presents a new challenge,that of extracting meaning from a long list of differentially expressed genes and proteins”. (Khatri et al,2012)
  16. 16. “Techniques such as high-trougput (HT) sequencing and gene/protein profiling have transformed biological Research” (Khatri et al,2012) “In this way,the advent of HT profiling technologies presents a new challenge,that of extracting meaning from a long list of differentially expressed genes and proteins”. (Khatri et al,2012) These biological techniques changes the way we study biological science. Interdisciplinary effort to extract meaning, analyze, and obtain information with high levels of confidence and quality.
  17. 17. [14:56 18/11/2011 Bioinformatics-btr585.tex] Page: 3331 3331–3332 commonly used in bioinformatics and their common synonyms, plural forms and abbreviations. We then searched this list against the PubMed titles and abstracts to identify the number of papers published per year for each machine learning technique. To match as many papers as possible, searches were case insensitive and allowed for variation in hyphenation. Fig. 1. The growth of supervised machine learning methods in PubMed. ∗To whom correspondence should be addressed perhaps going out of fashion. The results show that none of the major league methods has gone out of fashion, but we do see moderate decreases in the use of both ANNs and Markov models in the literature. We were also curious to find out if certain machine learning techniques were used in combination with each other. To investigate this, we looked at what machine learning methods are co-mentioned in articles (See Fig. 2). For all pairs of methods from the Supervised Fig. 2. Heatmap showing the co-occurrence of machine learning techniques within articles. © The Author(s) 2011. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. byguestonDecember7,2011ormatics.oxfordjournals.org/ “Hot techniques”: ANN, Markov Models,and“new ones” SVM and Random Forests. (Jensen & Bateman in 2011) IntelligentSystems Latent Topic Analysisis not in the list of methods.
  18. 18. “In particular,supervised machine learning has been used to great effect in numerous bioinformatics prediction methods”. (Jensen & Bateman,2011) Machine learning is of immense importance in bioinformatics and more generally for biomedical sciences (Larrañaga et al.,2006;Tarca et al.,2007). Because in metabolic systems analysis,is not common, I think that is important to emphasise that:
  19. 19. There are no references in the literature for analysis of metabolic pathways from a functional approach,or using proposed machine learning methods. IntelligentSystems
  20. 20. Larrañaga et al. bib.oxfordjournals.org at The Reference Shelf on May 30, 2011 achineLearning
  21. 21. Larrañaga et al. bib.oxfordjournals.org at The Reference Shelf on May 30, 2011 Bayesian classifiers, Feature subset selection SVM,ANN, classification trees, Evolutionary algorithms tabu search nearest neighbour, SVM, Bayesian classifier, fuzzy k-NN Bayesiangeneralizationofthe SVM,ANN,lineardiscriminant analysis,classificationtrees,ANN SVMandHMM, linear discriminant analysis, quadratic discriminant analysis, k-NN classifier, bagging and boosting classification trees, SVM and random forest achineLearning
  22. 22. Larrañaga et al. bib.oxfordjournals.org at The Reference Shelf on May 30, 2011 Bayesian classifiers, Feature subset selection SVM,ANN, classification trees, Evolutionary algorithms tabu search nearest neighbour, SVM, Bayesian classifier, fuzzy k-NN Bayesiangeneralizationofthe SVM,ANN,lineardiscriminant analysis,classificationtrees,ANN probabilistic graphical models, classification trees, boosting with classification trees SVMandHMM, linear discriminant analysis, quadratic discriminant analysis, k-NN classifier, bagging and boosting classification trees, SVM and random forest achineLearning
  23. 23. Why? http://www.perftrends.com/images/why.jpg
  24. 24. ... or Methods are not applied to Metabolic Pathways... ...or are based onTopological (Graph Based) network representations
  25. 25. • It should be possible to make some advances in understanding the underlying functional conformation of metabolic pathways. Statem ent http://www.scriptmag.com/wp-content/uploads/BrainStorm-NewColor-12-22_32-1280x980at86.jpg
  26. 26. http://www.scriptmag.com/wp-content/uploads/BrainStorm-NewColor-12-22_32-1280x980at86.jpg • Supervised Clustering - useful to test the given representation - by classifying the biochemical reactions. http://www.ee.ryerson.ca/~courses/ele888/ele_888_pat_class.gif Statem ent
  27. 27. http://diversity-mining-lab.wikispaces.com/ Statem ent
  28. 28. • Information Retrieval algebraic models, like vector space based ones, should “reveal” topics that occurs in document collections. • Is it possible to generate new - “really new” pathways? • ...I’m talking about synthetic biology. http://diversity-mining-lab.wikispaces.com/ Statem ent
  29. 29. Research Question Is it possible to classify metabolic networks only using functional features?
  30. 30. How? http://www.wired.com/images_blogs/threatlevel/2012/10/harris002.jpg
  31. 31. Goals • To Classify functionally, (without considering the topological structure) metabolic pathways based on machine learning methods.
  32. 32. Goals • To Classify functionally, (without considering the topological structure) metabolic pathways based on machine learning methods. • To Build or adapt a system of functional representation for metabolic networks.
  33. 33. Goals • To Classify functionally, (without considering the topological structure) metabolic pathways based on machine learning methods. • To Build or adapt a system of functional representation for metabolic networks. • To Classify metabolic networks using machine learning methods.
  34. 34. Goals • To Classify functionally, (without considering the topological structure) metabolic pathways based on machine learning methods. • To Build or adapt a system of functional representation for metabolic networks. • To Classify metabolic networks using machine learning methods. • To Apply (in new ways) machine learning methods in the study of systems biology.
  35. 35. Methodology S1 + S2 + … Sn P1 + P2 + … Pn Enzime CoFactor CoEnzime General Metabolic Reaction Model - GMRM Vectorization of GMRM S1 S2 S3 Enzime CoF CoE P1 P2 P3 MetaCyc KEGG1 2 RepresentationClassification CarlosManuelEstévez-BretónR.2012 DataSourceEvaluation Method 2Method 1 ROC Confusion matrix Entropy purity adjusted Rand Index Accuracy Pipeline paper paper paper
  36. 36. DataSources MetaCyc KEGG1 2
  37. 37. DataRepresentation S1 + S2 + … Sn P1 + P2 + … Pn Enzime CoFactor CoEnzime General Metabolic Reaction Model - GMRM Vectorization of GMRM S1 S2 S3 Enzime CoF CoE P1 P2 P3
  38. 38. Classification Supervised Classification Method 1
  39. 39. •Let’s think about clustering without any prior knowledge... • Applying Information Retrieval methods to Metabolic Pathways data. Method 2
  40. 40. Evaluation ROC Confusion matrix Entropy purity adjusted Rand Index Accuracy http://www.intechopen.com/source/html/38584/media/image56.jpeg Classified as: Really is: Positive Negative Positive Negative False Negative True NegativeFalse Positive True Positive
  41. 41. Evaluation ROC Confusion matrix Entropy purity adjusted Rand Index Accuracy http://www.intechopen.com/source/html/38584/media/image56.jpeg Classified as: Really is: Positive Negative Positive Negative False Negative True NegativeFalse Positive True Positive Error Rate Recall/sensitivity Specificity/True Negative Rate Precision 1-Specificity/False Alarm Rate
  42. 42. Evaluation ROC Confusion matrix Entropy purity adjusted Rand Index Accuracy http://www.intechopen.com/source/html/38584/media/image56.jpeg http://wwww.cbgstat.com/v2/method_ROC_curve_MedCalc/images/ROC_curve_MedCalc_Snap17.gif
  43. 43. Deliverables A computational metabolic representation proposal A computational metabolic classification method A generative metabolic pathways model A pipeline for metabolic pathways analysis
  44. 44. Progress ... http://desktop.freewallpaper4.me/view/original/3714/the-lonely-man.jpg
  45. 45. PreliminaryResults S1 + S2 + … Sn P1 + P2 + … Pn Enzime CoFactor CoEnzime General Metabolic Reaction Model - GMRM Vectorization of GMRM S1 S2 S3 Enzime CoF CoE P1 P2 P3 MetaCyc KEGG1 2 RepresentationClassification CarlosManuelEstévez-BretónR.2012 DataSourceEvaluation Method 2Method 1 ROC Confusion matrix Entropy purity adjusted Rand Index Accuracy Pipeline paper paper paper
  46. 46. Complexity Metabolic Pathway Reaction Metabolites/ome Metabolic Switch Glucose Glucose 6P ATP Hidrolase Pyrophosphate Vocabulary Words Molecules the Murder for a jar of red rum frog soap Document Phrase Paragraph rum Murder for jar a ofred rum Murder for jar a ofred Glucose Glucose 6PATP Hidrolase ADP+ + ADP LinguisticAnalogy S1 + S2 + … Sn P1 + P2 + … Pn Enzime CoFactor CoEnzime General Metabolic Reaction Model - GMRM Vectorization of GMRM S1 S2 S3 Enzime CoF CoE P1 P2 P3
  47. 47. Representation S1 + S2 + … Sn P1 + P2 + … Pn Enzime CoFactor CoEnzime General Metabolic Reaction Model - GMRM Vectorization of GMRM S1 S2 S3 Enzime CoF CoE P1 P2 P3
  48. 48. Classification Supervised 4Pathways 2carbohydrate metabolism 1lipid metabolism 1from nucleotide metabolism Support Vector Machines Classification Tree K Nearest Neighbour CN2Naive Bayes 24 organisms Method 1
  49. 49. Pipeline
  50. 50. Review - Proposing a vector representation of biochemical reactions, based in a linguistic analogy. I´m going to classify metabolic networks only using functional features... To find patterns that suggests constitution rules on metabolic pathways. - Searching patterns by clustering.
  51. 51. Thanks @karelman

×