SlideShare a Scribd company logo
1 of 25
Syntactic Pattern Discovery as a Generic Tool in Systems Biology Kyle L. Jensen 20 December 2001 Or: How I learned to stop worrying and love biology.
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Part I:   Introduction
Pattern Discovery ,[object Object],[object Object],introduction -> pattern discovery 0 12 13 0 2 1 7 8 9 10 integers -     characters - MSKNIVLLPGDHVGPEVVA amino acids  - ATGAGCATCGATCGATCGAATCTA nucleotides - Basic Question:  When are two events the same? primitive steams A B C D E F patterns: V[HDV].[ST]K  12 . . 1 . 7 TCGATCGA
A Little History ,[object Object],[object Object],[object Object],[object Object],[object Object],introduction -> syntactic pattern discovery -> a little history submedian telocentric primitives: a b c d e babcbabdacad ebabcbab RP[VI]ILDPx[DE]PT  ATCATACTATACGA   H…..HRD.K..N   Teireisas serine kinase AlignACE yeast promoter Prosite family classifier
An Illustrative Example ,[object Object],lliw, recnac, poleved, elbi, ylbaborp, enummi, eugalp, setebaid, ylekil, otelbitpecsus, kcaj, nhoj, ylbaborpsi, llij, esnopsere, noos, retal, esnopserenummina, sire, polevedyl, recnacote, tonlliw, otenummi, otelbitpecsusylbaborpsi, sikcaj, sirecnac, polevedlli, lliwesnopsere, otylekilsi, setebaidotelbitpecsus, wnhoj, evah, alpo, sinhoj, elbirroh will, cancer, develop, ible, probably, immune, plague, diabetes, likely, susceptibleto, jack, john, isprobably, jill, eresponse, soon, later, animmuneresponse, eris, lydevelop, etocancer, willnot, immuneto, isprobablysusceptibleto, jackis, canceris, illdevelop, eresponsewill, islikelyto, susceptibletodiabetes, johnw, have, opla, johnis, horrible recnacotenummiylbaborpsikcaj  •  recnacotelbitpecsusylbaborpsinhoj  •  retalsetebaidpolevedylbaborplliwllij  dabsirecnac  •  elbirrohsaweugalp  •  noosrecnacevahotylekilsikcaj  •  retalsetebaiddlimdepolevedllij  eugalpdlimotelbitpecsusylbaborpsinhoj  •  wolebtonlliwesnopserenummina  •  retalpolevedylbaborplliwsetebaid  noosrecnacpolevedylekillliwllij  •  eugalpdabevahlliwnhojretal  •  setebaidotelbitpecsussawnhoj  polevedtonlliwesnopserenummina  •  enajnipolevedotylekilsirecnac  •  eugalppolevedlliwkcaj  recnacotenummisienaj  •  setebaidotelbitpecsusebnooslliwkcaj  •  eugalppolevedlliwylbaborpkcaj  elbirrohsirecnac  •  ylekiltonsiretalesnopserenummina  •  setebaidotelbitpecsussinhoj  recnacpolevedylekilnooslliwnhoj  •  ylekilebtonlliwsetebaid  •  tceffenaevahtonlliwrecnac  eugalpotenummisillij  •  elbirroheblliwesnopsereht jackisprobablyimmunetocancer  •  johnisprobablysusceptibletocancer  •  marywillprobablydevelopdiabeteslater cancerisbad  •  plaguewashorrible  •  jackislikelytohavecancersoon  •  marydevelopedmilddiabeteslater johnisprobablysusceptibletomildplague  •  animmuneresponsewillnotbelow  •  diabeteswillprobablydeveloplater marywilllikelydevelopcancersoon  •  laterjohnwillhavebadplague  •  maryisprobablysusceptibletocancer  animmuneresponseislikelytodevelopsoon  •  jackisprobablyimmunetoplague  •  johnwassusceptibletodiabetes  animmuneresponsewillnotdevelop  •  cancerislikelytodevelopinjane  •  jackwilldevelopplague  •  janeisimmunetocancer jackwillsoonbesusceptibletodiabetes  •  jackprobablywilldevelopplague  •  cancerishorrible  animmuneresponselaterisnotlikely  •  johnissusceptibletodiabetes  •  johnwillsoonlikelydevelopcancer  diabeteswillnotbelikely  •  cancerwillnothaveaneffect  •  maryisimmunetoplague  •  therebsponsewillbehorrible introduction -> syntactic pattern discovery -> a quick example Given sequences: Strings with 4+ chars occurring 3+ times: … things that occur many times… … find important features… … but, what is “important”… How do we know these are important? John is probably susceptible to cancer.
Teiresias Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],introduction -> teiresias -> teiresias overview density = 9/19 Example Output: 6/15/2 patterns AFGLYEPC......L HQ.G.ET[ST]NS L.....A....SLKII.KA LFPCFY wildcard density = 6/6
Teiresias Example ,[object Object],>protein 0 MSKNIVLLPGDHVGPEVVAEAVKVLEAVSSAIGVKFNFSKHLIGGASIDAYGVPLSDEALEAAKK >protein 1 MSKQILVLPGDGIGPEIMAEAVKVLELANDRFQLGFELAEDVIGGAAIDKHGVP >protein 2 MKFLILLFNILCLFPVLAADNHGVGPQGASGVDPITFDINSNQTGPAFLT Take away point: Given sequences, Teiresias finds possibly important patterns in them. introduction -> teiresias -> teiresias example All patterns with at least 5 characters, density 5/8, and support 2 TEIRESIAS 5/8/2 pattern GPE..AEAVKVLE IGGA.ID..GVP MSK.I..LPGD..GPE A.D.HGV location (0,13) (1,13) (0,42) (1,42) (0,00) (1,00) (1,46) (2,17)
Part II:   Proposed Problems
Biological Sequences ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],proposed problems -> biological sequences
Proposed Problems ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],proposed problems -> biological sequences -> proposed problems
Expression and Physiology ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],proposed problems -> expression and physiology -> motivation
Association Discovery Example ,[object Object],[object Object],proposed problems -> expression and physiological data -> association discovery 63  1  145  233  1  2  150  0  3  0  6  0 67  1  160  286  0  2  108  1  2  3  3  2 67  1  120  229  0  2  129  1  2  2  7  1 37  1  130  250  0  0  187  0  3  0  3  0 41  0  130  204  0  2  172  0  1  0  3  0 Patients with type 2 EKG anomaly, with positive fluoroscopy results and high blood pressure are likely to have more than one critically clogged artery.  age sex blood pres. pain type choles. blood sug. ekg exercise ekg depress. fluoroscopy +’s ekg anomaly #>50% clogged Find conserved motifs in the rows
Proposed Problems ,[object Object],[object Object],proposed problems -> expression and physiological data physiological 1 2 3 4 A B C D samples 1 16 10 15 5 26 45 65 45 16 7 54 14 9 8 23 0 -2 7 9 2 3 4 -1 1 5 5 -2 -2 3 -1 2 Example associations: “ Genes 1 and 4 are associated with pathway   ” or  “ Up-regulation of genes {4,6,10,…} gives rise to phenotype   ” gene expression 1 2 3 4 A B C D How does the genome relate to the “physiome”? Are there any recurring motifs? … biological significance?
Part III:   Work To Date
Motivation ,[object Object],[object Object],sequence alignments work to date -> aa scoring matrices -> motivation sequence KSDFKJSDTLK ASLD KJFSLD D SLKDJFSKL SKDJFKD KSJDLKL SLKDJLKSJDL LKJDLKSJDKS database scoring matrix KSDFSDTLK ASLDKJFSLDD SLKDJFSKL LKD KSJDLKL SLKDJLKSJDL LKJDLJDKS KSDFSDD ASLDKJF SLKDJFS LKDFJDK KSJDLKL SLKDJLK LKJDLJD KSDFSDTLK ASLDKJFSLDD SLKDJFSKL LKD KSJDLKL SLKDJLKSJDL LKJDLJDKS But what do we mean by similar?
Scoring Matrix Basics ,[object Object],[object Object],RKISWMEIYTGEKSTKVYGQDVWLPAETLDLIREYRVAIKGPLTTPVGGGIRSLNVALRQ ::: :.:.: :::.:. : .. ::: :::....::.:.::::::::::::  :::::.:: RKIEWLEVYAGEKATQMYDSETWLPEETLNILQEYKVSIKGPLTTPVGGGMSSLNVAIRQ For detecting  homology  the matrix should capture evolutionary processes. … but how do we describe evolution? Highest score is the “best” alignment. alignment work to date -> aa scoring matrices -> scoring matrix basics score for K-Q alignment A  R  N  D  C  M  E  G  H  I  L  K  Q A R N K C Q E 5 –3 –4 –6 –7 –3 –2 –1 –1  0 –3 –2 –1 – 1  0 –3 –2 –1 –1  0 –3 –2 –1 –4 –6 –7 5 –3 –4 –6 –7 –3 –2 –1 –1  0 –3 –2 –1 – 1  0 –3 –2 –1 –1  0 –3 –2 –1 –4 –6 –7 5 –3 –4 –6 –7 –3 –2 –1 –1  0 –3 –2 –1 – 1  0 –3 –2 –1 –1  0 –3 –2 –1 –4 –6 –7 – 4 –6 –7 –3 –4 –6 –7 –3  scoring matrix
Protein Evolution ,[object Object],work to date -> aa scoring matrices -> protein evolution ILHLV G PN G A GK S TL LARMA ancestral protein IVTLI G AN G A GK S TL LMTLC MAFLT G HS G A GK S TL LKLIC VVVII G PS G S GK S TL VRCIN NIMVV G PS G S GK S TL LRCIN VTAFI G PS G C GK T TL LRTFN MAFLT G HS G A GK S P L LKLIC VVVII G PS V S GK S TL VRCIN … use syntactic pattern discovery to find these  conserved motifs. not functional The distribution of amino acids in the changing positions describes the evolutionary process… G .. G . GK . TL active site NIMVV G QS G L GK S TL INTLF descendant proteins
Discovering Patterns ,[object Object],>sp•Q07698•ABCA_AERSA ABC transporter protein MSEPVLAVSGVNKSFPIYRSPWQALWHALNPKADVKVFQALRDIELTVYRGETIGIV GHNGAGKSTLLQLITGVMQPDCGQITRTGRVVGLLELGSGFNPEFTGRENIFFNGAI LGMSQREMDDRLERILSFAAIGDFIDQPVKNYSSGMMVRLAFSVIINTDPDVLIIDE ALAVGDDAFQRKCYARLKQLQSQGVTILLVSHAAGSVIELCDRAVLLDRGEVLLQGE PKAVVHNYHKLLHMEGDERARFRYHLRQTGRGDSYISDESTSEPKIKSAPGILSVDL QPQSTVWYESKGAVLSDVHIESF  >sp•Q02856•ABCX_ANTSP Probable ATP•dependent transporter MNNRILLNIKNLDVTIGETQILNSLNLSIKPGEIHAIMGKNGSGKSTLAKVIAGHPSYKI TNGQILFENQDVTEIEPEDRSHLGIFLAFQYPVEIPGVTNADFLRIAYNAKRAFDNKEEL DPLSFFSFIENKISNIDLNSTFLSRNVNEGFSGGEKKKNEILQMSLLNSKLAILDETDSG LDIDALKTIAKQINSLKTQENSIILITHYQRLLDYIKPDYIHVMQKGEIIYTGGSDTAMK LEKYGYDYLNK  ATP binding motif  G..G.GK[ST]TL  was “discovered” in 2500 sequences in SWISS-PROT/TrEMBL. … how do we construct the scoring matrix? >sp•P07655•PSTB_ECOLI ATP•BINDING PROTEIN PSTB MSMVETAPSKIQVRNLNFYYGKFHALKNINLDIAKNQVTAFIGPSGCGKSTLLRTFNKMFELYPEQRAEGEILLDGDNILTNSQDIALLRAKVGMVFQKPTPFPMSIYDNIAFGVRLFEKLSRADMDERVQWALTKAALWNETKDKLHQSGYSLSGGQQQRLCIARGIAIRPEVLLLDEPCSALDPISTGRIEELITELKQDYTVVIVTHNMQQAARCSDHTAFMYLGELIEFSNTDDLFTKPAKKQTEDYITGRYG >sp•P10346•GLNQ_ECOLI ATP•BINDING PROTEIN GLNQ GPTQVLHNIDLNIAQGEVVVIIGPSGSGKSTLLRCINKLEEITSGDLIVDGLKVNDPKVDERLIRQEAGMVFQQFYLFPHLTALENVMFGPLRVRGANKEEAKLARELLAKVGLAERAHHYPSELSGGQQQRVAIARALAVKPKMMLFDEPTSALDPELRHEVLKVMQDLAEEGMTMVIVTHEIGFAEKVASRLIFIDKGRIAEDGNPQVLIKNPPSQRLQEFLQHVS Given a database, we can use Teiresias to find the conserved motifs… work to date -> aa scoring matrices -> discovering motifs ATP binding signature
Patterns to Matrix ,[object Object],work to date -> aa scoring matrices -> patterns to matrix Example Pattern: L..F.L..CI...L IINSSLWWIIKGPILISI L VN F I L FI CI IRI L VQKLRPPDIG Seq A • LTLITRVGLA L SL F C L LL CI LTF L LVRPIQGSRTTIHLHLCICLFVG Seq B • IKTPILVSI L RN F I L FI CI IRI L VQKLHSPDVGHNE Seq C • How many AA pairs are there at each position? pairs 1 – VS 1 – VR 1 • SR pairs 1 – FF 2 • LF Count AA pairs for all patterns and construct a table of pair counts. A  R  N  D  C  M  E  G  H  I  L  K  Q A R N K C Q E 34 23 43 56 78 32 12 54 76 43 23 21 11 12 54 76 43 23 21 11 12 54 76 43 23 21 23 43 56 78 32 12 54 76 43 23 21 76 43 76 43 23 21 76 43 23 21 76 43 23 21 45 67 87 76 43 23 21 12 39 05 37 29 04 23 90 76 43 23 21 76 43 23 21 87 76 43 22 54 23 54 23 12 64 76 45 AA pair frequency table
Patterns to Matrix ,[object Object],Take away point: The evolutionary information contained in the patterns is stored in terms of the scoring matrix. work to date -> aa scoring matrices -> patterns to matrix odds that a AA pair does not occur by chance probability of seeing AA pair in our patterns probability of seeing AA pair by chance = A  R  N  D  C  M  E  G  H  I  L  K  Q A R N K C Q E 34 23 43 56 78 32 12 54 76 43 23 21 11 12 54 76 43 23 21 11 12 54 76 43 23 21 23 43 56 78 32 12 54 76 43 23 21 76 43 76 43 23 21 76 43 23 21 76 43 23 21 45 67 87 76 43 23 21 12 39 05 37 29 04 23 90 76 43 23 21 76 43 23 21 87 76 43 22 54 23 54 23 12 64 76 45 AA pair frequency table A  R  N  D  C  M  E  G  H  I  L  K  Q A R N K C Q E 5 –3 –4 –6 –7 –3 –2 –1 –1  0 –3 –2 –1 – 1  0 –3 –2 –1 –1  0 –3 –2 –1 –4 –6 –7 5 –3 –4 –6 –7 –3 –2 –1 –1  0 –3 –2 –1 – 1  0 –3 –2 –1 –1  0 –3 –2 –1 –4 –6 –7 5 –3 –4 –6 –7 –3 –2 –1 –1  0 –3 –2 –1 – 1  0 –3 –2 –1 –1  0 –3 –2 –1 –4 –6 –7 – 4  6 –7 –3 –4 –6 –7 –3  AA log•of•odds scoring matrix MATH positive values mean these pairs are more prevalent in our patterns than by chance… … and negative values are less prevalent
Basic Idea TEIRESIAS MATRIX ENGINE Take away point: Given a set of sequences, we use Teiresias to discover important patterns and construct a scoring matrix which captures the way these patterns are evolving. BDSUM: B io- D ictionary AA  Su bstitution  M atrices work to date -> aa scoring matrices -> basic idea KSDFKJSDTLK ASLD KJFSLD D SLKDJFSKL SKDJFKD KSJDLKL SLKDJLKSJDL LKJDLKSJDKS database HQ.G.ET..STNS RP..K.TSTP.NS L.S.DF.SLKS.DKIS V...EG.A..YPDVEL A..YPDVEL.NS EG.A K.T patterns scoring matrix
Example Results ,[object Object],[object Object],Experiment: Using each sequence from the family, try to detect the other 99 sequences in the Swiss-Prot/TrEMBL database.  work to date -> aa scoring matrices -> example results 100 0 0 Results: BDSUM(PS00470) win loss tie BLOSUM62(PS00470) BLOSUM62(Prosite) 30 17 53 BDSUM(PS00470) 47 9 44 BLOSUM50(Prosite) BDSUM(PS00470)
Current Work ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],work to date -> aa scoring matrices -> current work … and the oligo probes…
Acknowledgements ,[object Object],[object Object],Group members: Mike, Maciek, Bill, Daehee, Jatin, Vipin, Maria, Javier, Maria, Matt, Gary, Saliya, Juan, Angelo, Chris, Dan, Giovanna, Joanne, Hyun-Tae, Patrick, Kyongbum…

More Related Content

What's hot

Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
Universitat Politècnica de Catalunya
 
Anomaly Detection At The Edge
Anomaly Detection At The EdgeAnomaly Detection At The Edge
Anomaly Detection At The Edge
Arun Kejariwal
 
Expert Systems
Expert SystemsExpert Systems
Expert Systems
osmancikk
 
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)
Krishnaram Kenthapadi
 

What's hot (20)

ntroducing to the Power of Graph Technology
ntroducing to the Power of Graph Technologyntroducing to the Power of Graph Technology
ntroducing to the Power of Graph Technology
 
Tutorial on Advances in Bias-aware Recommendation on the Web @ WSDM 2021
Tutorial on Advances in Bias-aware Recommendation on the Web @ WSDM 2021Tutorial on Advances in Bias-aware Recommendation on the Web @ WSDM 2021
Tutorial on Advances in Bias-aware Recommendation on the Web @ WSDM 2021
 
Neo4j Graph Platform Overview, Kurt Freytag, Neo4j
Neo4j Graph Platform Overview, Kurt Freytag, Neo4jNeo4j Graph Platform Overview, Kurt Freytag, Neo4j
Neo4j Graph Platform Overview, Kurt Freytag, Neo4j
 
Graph neural networks overview
Graph neural networks overviewGraph neural networks overview
Graph neural networks overview
 
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
 
Graph Data Science at Scale
Graph Data Science at ScaleGraph Data Science at Scale
Graph Data Science at Scale
 
POLE Investigations with Neo4j
POLE Investigations with Neo4jPOLE Investigations with Neo4j
POLE Investigations with Neo4j
 
Machine learning ppt.
Machine learning ppt.Machine learning ppt.
Machine learning ppt.
 
Anomaly Detection At The Edge
Anomaly Detection At The EdgeAnomaly Detection At The Edge
Anomaly Detection At The Edge
 
Master's Thesis Presentation
Master's Thesis PresentationMaster's Thesis Presentation
Master's Thesis Presentation
 
Alzheimer's disease classification using Deep learning Neural a Network and G...
Alzheimer's disease classification using Deep learning Neural a Network and G...Alzheimer's disease classification using Deep learning Neural a Network and G...
Alzheimer's disease classification using Deep learning Neural a Network and G...
 
Expert Systems
Expert SystemsExpert Systems
Expert Systems
 
Arabic Handwritten Text Recognition and Writer Identification
Arabic Handwritten Text Recognition and Writer IdentificationArabic Handwritten Text Recognition and Writer Identification
Arabic Handwritten Text Recognition and Writer Identification
 
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)
 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph Databases
 
Medical advice as a Recommender System
Medical advice as a Recommender SystemMedical advice as a Recommender System
Medical advice as a Recommender System
 
Organs-On-Chips Market and Technology Landscape 2019 report by Yole Développe...
Organs-On-Chips Market and Technology Landscape 2019 report by Yole Développe...Organs-On-Chips Market and Technology Landscape 2019 report by Yole Développe...
Organs-On-Chips Market and Technology Landscape 2019 report by Yole Développe...
 
Graph Algorithms for Developers
Graph Algorithms for DevelopersGraph Algorithms for Developers
Graph Algorithms for Developers
 
Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biologyKernel methods for data integration in systems biology
Kernel methods for data integration in systems biology
 
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph AlgorithmsNeo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
 

Similar to Kyle Jensen's MIT Ph.D. Thesis Proposal

Kyle Jensen MIT Ph.D. Thesis Defense
Kyle Jensen MIT Ph.D. Thesis DefenseKyle Jensen MIT Ph.D. Thesis Defense
Kyle Jensen MIT Ph.D. Thesis Defense
Kyle Jensen
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Elia Brodsky
 
Medinfo2013 - An RDF/OWL Knowledge Base for Query Answering and Decision Supp...
Medinfo2013 - An RDF/OWL Knowledge Base for Query Answering and Decision Supp...Medinfo2013 - An RDF/OWL Knowledge Base for Query Answering and Decision Supp...
Medinfo2013 - An RDF/OWL Knowledge Base for Query Answering and Decision Supp...
Matthias Samwald
 

Similar to Kyle Jensen's MIT Ph.D. Thesis Proposal (20)

2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekinge2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekinge
 
Bioinformatica t3-scoringmatrices v2014
Bioinformatica t3-scoringmatrices v2014Bioinformatica t3-scoringmatrices v2014
Bioinformatica t3-scoringmatrices v2014
 
2015 bioinformatics score_matrices_wim_vancriekinge
2015 bioinformatics score_matrices_wim_vancriekinge2015 bioinformatics score_matrices_wim_vancriekinge
2015 bioinformatics score_matrices_wim_vancriekinge
 
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
 
Sequence Analysis.ppt
Sequence Analysis.pptSequence Analysis.ppt
Sequence Analysis.ppt
 
Kyle Jensen MIT Ph.D. Thesis Defense
Kyle Jensen MIT Ph.D. Thesis DefenseKyle Jensen MIT Ph.D. Thesis Defense
Kyle Jensen MIT Ph.D. Thesis Defense
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
 
Bioinformatics life sciences_v2015
Bioinformatics life sciences_v2015Bioinformatics life sciences_v2015
Bioinformatics life sciences_v2015
 
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...
 
The Scientific Method on the Semantic Web
The Scientific Method on the Semantic WebThe Scientific Method on the Semantic Web
The Scientific Method on the Semantic Web
 
Inferential stat tests samples discuss 4
Inferential stat tests samples discuss 4Inferential stat tests samples discuss 4
Inferential stat tests samples discuss 4
 
Agnė DZIDOLIKAITĖ. Evolutionary Approach in Optimization
Agnė DZIDOLIKAITĖ. Evolutionary Approach in OptimizationAgnė DZIDOLIKAITĖ. Evolutionary Approach in Optimization
Agnė DZIDOLIKAITĖ. Evolutionary Approach in Optimization
 
Critical Reading Biomedical Research Papers-2022.pptx
Critical Reading Biomedical Research Papers-2022.pptxCritical Reading Biomedical Research Papers-2022.pptx
Critical Reading Biomedical Research Papers-2022.pptx
 
Medinfo2013 - An RDF/OWL Knowledge Base for Query Answering and Decision Supp...
Medinfo2013 - An RDF/OWL Knowledge Base for Query Answering and Decision Supp...Medinfo2013 - An RDF/OWL Knowledge Base for Query Answering and Decision Supp...
Medinfo2013 - An RDF/OWL Knowledge Base for Query Answering and Decision Supp...
 
Parametric and non parametric test
Parametric and non parametric testParametric and non parametric test
Parametric and non parametric test
 
Predicting peptide/MHC interactions: Application to epitope identification an...
Predicting peptide/MHC interactions: Application to epitope identification an...Predicting peptide/MHC interactions: Application to epitope identification an...
Predicting peptide/MHC interactions: Application to epitope identification an...
 
Glycan Structural Analysis Throughout Biotherapeutic Development
Glycan Structural Analysis Throughout Biotherapeutic Development Glycan Structural Analysis Throughout Biotherapeutic Development
Glycan Structural Analysis Throughout Biotherapeutic Development
 
The Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in BiologyThe Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in Biology
 
2015 US Combustion Meeting - West - Identification, Correction, and Compariso...
2015 US Combustion Meeting - West - Identification, Correction, and Compariso...2015 US Combustion Meeting - West - Identification, Correction, and Compariso...
2015 US Combustion Meeting - West - Identification, Correction, and Compariso...
 
Single-Cell Transcriptome Analysis of Pluripotent Stem Cells
Single-Cell Transcriptome Analysis of Pluripotent Stem CellsSingle-Cell Transcriptome Analysis of Pluripotent Stem Cells
Single-Cell Transcriptome Analysis of Pluripotent Stem Cells
 

More from Kyle Jensen

The intellectual property landscape of the human genome
The intellectual property landscape of the human genomeThe intellectual property landscape of the human genome
The intellectual property landscape of the human genome
Kyle Jensen
 
Eschew Obfuscation
Eschew ObfuscationEschew Obfuscation
Eschew Obfuscation
Kyle Jensen
 
A simple method for incorporating sequence information into directed evolutio...
A simple method for incorporating sequence information into directed evolutio...A simple method for incorporating sequence information into directed evolutio...
A simple method for incorporating sequence information into directed evolutio...
Kyle Jensen
 
HOẠT ĐỘNG NGHIÊN CỨU KHOA HỌC VÀ CHUYỂN GIAO CÔNG NGHỆ Trường Đại học Cần Thơ
HOẠT ĐỘNG NGHIÊN CỨU KHOA HỌC VÀ  CHUYỂN GIAO CÔNG NGHỆ Trường Đại học Cần ThơHOẠT ĐỘNG NGHIÊN CỨU KHOA HỌC VÀ  CHUYỂN GIAO CÔNG NGHỆ Trường Đại học Cần Thơ
HOẠT ĐỘNG NGHIÊN CỨU KHOA HỌC VÀ CHUYỂN GIAO CÔNG NGHỆ Trường Đại học Cần Thơ
Kyle Jensen
 
ChuyểN Giao QuyềN đốI VớI GiốNg CâY TrồNg
ChuyểN Giao QuyềN đốI VớI GiốNg CâY TrồNgChuyểN Giao QuyềN đốI VớI GiốNg CâY TrồNg
ChuyểN Giao QuyềN đốI VớI GiốNg CâY TrồNg
Kyle Jensen
 
BẢO HỘ GIỐNG CÂY TRỒNG VÀ ĐẶC QUYỀN CỦA NÔNG DÂN
BẢO HỘ GIỐNG CÂY TRỒNG VÀ ĐẶC QUYỀN CỦA NÔNG DÂNBẢO HỘ GIỐNG CÂY TRỒNG VÀ ĐẶC QUYỀN CỦA NÔNG DÂN
BẢO HỘ GIỐNG CÂY TRỒNG VÀ ĐẶC QUYỀN CỦA NÔNG DÂN
Kyle Jensen
 
Khái quát về những nguyên tắc cơ bản trong quản lý TSTT
Khái quát về những nguyên tắc cơ bản trong quản lý TSTTKhái quát về những nguyên tắc cơ bản trong quản lý TSTT
Khái quát về những nguyên tắc cơ bản trong quản lý TSTT
Kyle Jensen
 
Htqt Vietnam Chih Am Agreements License (Tv)
Htqt Vietnam Chih Am Agreements License (Tv)Htqt Vietnam Chih Am Agreements License (Tv)
Htqt Vietnam Chih Am Agreements License (Tv)
Kyle Jensen
 

More from Kyle Jensen (20)

Gemoda
GemodaGemoda
Gemoda
 
The intellectual property landscape of the human genome
The intellectual property landscape of the human genomeThe intellectual property landscape of the human genome
The intellectual property landscape of the human genome
 
Eschew Obfuscation
Eschew ObfuscationEschew Obfuscation
Eschew Obfuscation
 
A simple method for incorporating sequence information into directed evolutio...
A simple method for incorporating sequence information into directed evolutio...A simple method for incorporating sequence information into directed evolutio...
A simple method for incorporating sequence information into directed evolutio...
 
Kyle Jensen Research summary poster 2005
Kyle Jensen Research summary poster 2005Kyle Jensen Research summary poster 2005
Kyle Jensen Research summary poster 2005
 
HOẠT ĐỘNG NGHIÊN CỨU KHOA HỌC VÀ CHUYỂN GIAO CÔNG NGHỆ Trường Đại học Cần Thơ
HOẠT ĐỘNG NGHIÊN CỨU KHOA HỌC VÀ  CHUYỂN GIAO CÔNG NGHỆ Trường Đại học Cần ThơHOẠT ĐỘNG NGHIÊN CỨU KHOA HỌC VÀ  CHUYỂN GIAO CÔNG NGHỆ Trường Đại học Cần Thơ
HOẠT ĐỘNG NGHIÊN CỨU KHOA HỌC VÀ CHUYỂN GIAO CÔNG NGHỆ Trường Đại học Cần Thơ
 
ChuyểN Giao QuyềN đốI VớI GiốNg CâY TrồNg
ChuyểN Giao QuyềN đốI VớI GiốNg CâY TrồNgChuyểN Giao QuyềN đốI VớI GiốNg CâY TrồNg
ChuyểN Giao QuyềN đốI VớI GiốNg CâY TrồNg
 
BẢO HỘ GIỐNG CÂY TRỒNG VÀ ĐẶC QUYỀN CỦA NÔNG DÂN
BẢO HỘ GIỐNG CÂY TRỒNG VÀ ĐẶC QUYỀN CỦA NÔNG DÂNBẢO HỘ GIỐNG CÂY TRỒNG VÀ ĐẶC QUYỀN CỦA NÔNG DÂN
BẢO HỘ GIỐNG CÂY TRỒNG VÀ ĐẶC QUYỀN CỦA NÔNG DÂN
 
Khái quát về những nguyên tắc cơ bản trong quản lý TSTT
Khái quát về những nguyên tắc cơ bản trong quản lý TSTTKhái quát về những nguyên tắc cơ bản trong quản lý TSTT
Khái quát về những nguyên tắc cơ bản trong quản lý TSTT
 
Htqt Vietnam Chih Am Agreements License (Tv)
Htqt Vietnam Chih Am Agreements License (Tv)Htqt Vietnam Chih Am Agreements License (Tv)
Htqt Vietnam Chih Am Agreements License (Tv)
 
Chuyển giao công nghệ ở Việtnam
Chuyển giao công nghệ ở ViệtnamChuyển giao công nghệ ở Việtnam
Chuyển giao công nghệ ở Việtnam
 
Đầu tư mạo hiểm ở Việt Nam
Đầu tư mạo hiểm ở Việt NamĐầu tư mạo hiểm ở Việt Nam
Đầu tư mạo hiểm ở Việt Nam
 
Hình thành doanh nghiệp ở Việtnam
Hình thành doanh nghiệp ở ViệtnamHình thành doanh nghiệp ở Việtnam
Hình thành doanh nghiệp ở Việtnam
 
Chuyển giao (li-xăng) công nghệ
Chuyển giao (li-xăng) công nghệChuyển giao (li-xăng) công nghệ
Chuyển giao (li-xăng) công nghệ
 
Hợp đồng chuyển giao vật liệu: một công cụ cho chuyển giao công nghệ
Hợp đồng chuyển giao vật liệu: một công cụ cho chuyển giao công nghệHợp đồng chuyển giao vật liệu: một công cụ cho chuyển giao công nghệ
Hợp đồng chuyển giao vật liệu: một công cụ cho chuyển giao công nghệ
 
Lời giới thiệu về trang web miễn phí cho việc tra cứu sáng chế
Lời giới thiệu về trang web miễn phí cho việc tra cứu sáng chếLời giới thiệu về trang web miễn phí cho việc tra cứu sáng chế
Lời giới thiệu về trang web miễn phí cho việc tra cứu sáng chế
 
Tình huống
Tình huốngTình huống
Tình huống
 
Thực trang BHGCT ở Việtnam
Thực trang BHGCT ở ViệtnamThực trang BHGCT ở Việtnam
Thực trang BHGCT ở Việtnam
 
Thương mại hóa hoạt động nghiên cứu trong lĩnh vực công nghệ sinh học nông ng...
Thương mại hóa hoạt động nghiên cứu trong lĩnh vực công nghệ sinh học nông ng...Thương mại hóa hoạt động nghiên cứu trong lĩnh vực công nghệ sinh học nông ng...
Thương mại hóa hoạt động nghiên cứu trong lĩnh vực công nghệ sinh học nông ng...
 
Lời giới thiệu chung về quyền sở hữu trí tuệ
Lời giới thiệu chung về quyền sở hữu trí tuệLời giới thiệu chung về quyền sở hữu trí tuệ
Lời giới thiệu chung về quyền sở hữu trí tuệ
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 

Kyle Jensen's MIT Ph.D. Thesis Proposal

  • 1. Syntactic Pattern Discovery as a Generic Tool in Systems Biology Kyle L. Jensen 20 December 2001 Or: How I learned to stop worrying and love biology.
  • 2.
  • 3. Part I: Introduction
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9. Part II: Proposed Problems
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15. Part III: Work To Date
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22. Basic Idea TEIRESIAS MATRIX ENGINE Take away point: Given a set of sequences, we use Teiresias to discover important patterns and construct a scoring matrix which captures the way these patterns are evolving. BDSUM: B io- D ictionary AA Su bstitution M atrices work to date -> aa scoring matrices -> basic idea KSDFKJSDTLK ASLD KJFSLD D SLKDJFSKL SKDJFKD KSJDLKL SLKDJLKSJDL LKJDLKSJDKS database HQ.G.ET..STNS RP..K.TSTP.NS L.S.DF.SLKS.DKIS V...EG.A..YPDVEL A..YPDVEL.NS EG.A K.T patterns scoring matrix
  • 23.
  • 24.
  • 25.