SlideShare una empresa de Scribd logo
1 de 18
HUMAN GUIDED FORESTS
 Su lab meeting March 30, 2012
       Benjamin Good
AGENDA


1.     Motivation
      2. Idea
     3. Game
CHALLENGE


Need to build biological class predictors that:
          1. Have high accuracy
      2. Use relatively few variables

  To do this we have to use datasets that:
            1. Are very noisy
2. Contain enormous numbers of variables
EXAMPLE: BREAST CANCER
                     PROGNOSIS


                    Van‟tVeer 2002 Nature
                  98 breast cancer samples:
      34 developed metastases within 5 years, 44 did not
     18 had BRCA1 mutations, 2 had BRCA2 mutations
         expression levels of 25,000 genes measured
5,000 genes “significantly regulated across the sample groups”
98 tumors
genes that
coregulate
with ER




                                     5000
                                     genes




co-regulated
genes
indicating
lymphocytic
infiltrate




               70% bad     30% bad
METASTASIS PREDICTOR


231 genes were found to be significantly associated with disease outcome
Using leave-one-out cross-validation they empirically selected the 70 best
                individual genes to build their predictor
                  Of the 78 samples in the training set
               the predictor correctly classified 65 (83%)

                      ->MammaPrint test from Agendia
    Still in clinical trials (10 years since original study) (MINDACT)
WE CAN DO BETTER


                   This signature does not take advantage of:
                        • interactions between genes
 (together two variables may be much more predictive then either one alone)
                            • biological knowledge
(this signature leaves out several known cancer predictors and does not make
                    use of biological knowledge in any way)
THERE ARE MANY MANY CHALLENGES LIKE THIS
               IN BIOLOGY
AGENDA


1.     Motivation
      2. Idea
     3. Game
WE CAN DO BETTER BY INTEGRATING
            MACHINE LEARNING WITH BIOLOGICAL
                        EXPERTISE




         The standard signature does not take advantage of:
                   • interactions between genes
 machine learning algorithms can find and use these but can have
           problems when faced with large feature spaces
                      • biological knowledge
     can be used to guide the machine learning process towards
meaningful features in the data and thus reduce chances of overfitting
MACHINE LEARNING
ALGORITHM OF THE MOMENT



            • In each of many iterations, a small subset
              of features are chosen randomly and used
              to build one decision tree

            • Decision trees are stored and
              classifications are made based on the
              majority vote of all of the trees.

            • Good classifier!

            • But you get different forests every time
              you run it and it faces the same challenges
              of generalizability as any other learning
              algorithm.
NETWORK GUIDED FOREST (NGF)

                                                                                    Same algorithm except each tree is
                                                                                    constructed from a particular area
                                                                                    of a relevant protein-protein
                                                                                    interaction network.

                                                                                    1) Pick a gene randomly
                                                                                    2) Walk out along the network to
                                                                                       get the other N genes to use to
                                                                                       build that tree
                                                                                    3) repeat

                                                                                    The premise is that biologically
                                                                                    coherent modules will give better
                                                                                    signal than individual genes
                                                                                    randomly grouped together


Dutkowski & Ideker (2011) Protein Networks as Logic Functions in Development in Development and Cancer. PLoS Computational Biology
NGF RESULTS



A) Identical performance to random forest and
   random network guided forest as assessed by 5
   fold cross-validation repeated 100 times.




B) More known breast cancer genes show up in the
   forest




C) Similar genes selected for forests in two different
   training sets (different patient cohorts)
HUMAN GUIDED RANDOM
    FOREST (HGF)
                 Same algorithm again except each
                 trees are constructed from a
                 manually selected subset of genes
                 (or other features).

                 1) Find a person
                 2) Let them select what they
                    think is an optimal feature set
                 3) back to step one, N times
                 4) aggregate

                 The premise is that biological
                 knowledge can produce better
                 than random decision modules
                 and that not all biological
                 knowledge is captured in
                 interaction networks
HGF CHALLENGES


1) Find a person
2) Let them select what they think is an optimal feature set
3) back to step one, N times

• N may be large (e.g. 1,000)
• Need many knowledgeable people to work hard... for free
AGENDA


1.     Motivation
      2. Idea
     3. Game
COMBO!
                               COMPONENTS




                                                  Game interface(s)
       Java server        server provides         • manages game events:
•   hosts training data   features for games        users, moves, etc.
•   executes decision     (e.g. gene cards)       • two implementations so
    tree algorithm                                  far
•   performs cross-                                   • server side (JSP) card
    validation tests      client sends groups            game
•   logs data generated   of features („hands‟)       • client side (javascript)
    by game               to the server for              table game
                          scoring.
COMBO CODE AND DEMO


                                   Next steps
1.   Better preprocessing of training data
•    map contigs to genes where possible, filter out clearly useless genes
•    identify individually predictive genes
•    Game
•    build domain-specific boards, let players pick their knowledge area
•    Real two player feeling with robot partner
•    Special cards: robber card, any-gene selector card
•    High scores
•    ??????????????

Más contenido relacionado

Destacado

First oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoyFirst oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoyCominvent AS
 
Gene Wiki at Phenotype RCN annual meeting
Gene Wiki at Phenotype RCN annual meetingGene Wiki at Phenotype RCN annual meeting
Gene Wiki at Phenotype RCN annual meetingBenjamin Good
 
Buyer Remorse
Buyer RemorseBuyer Remorse
Buyer Remorsesmfox
 
Open source breakfast norge findwise
Open source breakfast norge findwiseOpen source breakfast norge findwise
Open source breakfast norge findwiseCominvent AS
 
2016 bd2k bgood_wikidata
2016 bd2k bgood_wikidata2016 bd2k bgood_wikidata
2016 bd2k bgood_wikidataBenjamin Good
 
Dagens Næringslivs overgang til Lucene/Solr søk
Dagens Næringslivs overgang til Lucene/Solr søkDagens Næringslivs overgang til Lucene/Solr søk
Dagens Næringslivs overgang til Lucene/Solr søkCominvent AS
 
Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giantsBenjamin Good
 
The National Society For The Protection Of Hmmm
The National Society For The Protection Of HmmmThe National Society For The Protection Of Hmmm
The National Society For The Protection Of Hmmmguest0233e9d0
 
Light steel villa catalogue log
Light steel villa catalogue logLight steel villa catalogue log
Light steel villa catalogue logeishimachinery
 
B2B Branding Explained
B2B Branding ExplainedB2B Branding Explained
B2B Branding Explainedcsadhy
 
Eishi Company Profile 修改好的
Eishi Company Profile 修改好的Eishi Company Profile 修改好的
Eishi Company Profile 修改好的eishimachinery
 
Oslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaOslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaCominvent AS
 
Gene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2KGene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2KBenjamin Good
 
Resume 2009 Compatible V2 1
Resume 2009 Compatible V2 1 Resume 2009 Compatible V2 1
Resume 2009 Compatible V2 1 schelby
 

Destacado (18)

First oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoyFirst oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoy
 
genegames.org
genegames.orggenegames.org
genegames.org
 
Gene Wiki at Phenotype RCN annual meeting
Gene Wiki at Phenotype RCN annual meetingGene Wiki at Phenotype RCN annual meeting
Gene Wiki at Phenotype RCN annual meeting
 
Buyer Remorse
Buyer RemorseBuyer Remorse
Buyer Remorse
 
2to3
2to32to3
2to3
 
Open source breakfast norge findwise
Open source breakfast norge findwiseOpen source breakfast norge findwise
Open source breakfast norge findwise
 
2016 mem good
2016 mem good2016 mem good
2016 mem good
 
2016 bd2k bgood_wikidata
2016 bd2k bgood_wikidata2016 bd2k bgood_wikidata
2016 bd2k bgood_wikidata
 
Dagens Næringslivs overgang til Lucene/Solr søk
Dagens Næringslivs overgang til Lucene/Solr søkDagens Næringslivs overgang til Lucene/Solr søk
Dagens Næringslivs overgang til Lucene/Solr søk
 
Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giants
 
The National Society For The Protection Of Hmmm
The National Society For The Protection Of HmmmThe National Society For The Protection Of Hmmm
The National Society For The Protection Of Hmmm
 
Light steel villa catalogue log
Light steel villa catalogue logLight steel villa catalogue log
Light steel villa catalogue log
 
B2B Branding Explained
B2B Branding ExplainedB2B Branding Explained
B2B Branding Explained
 
Eishi Company Profile 修改好的
Eishi Company Profile 修改好的Eishi Company Profile 修改好的
Eishi Company Profile 修改好的
 
Oslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaOslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alpha
 
Gene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2KGene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2K
 
(Bio)Hackathons
(Bio)Hackathons(Bio)Hackathons
(Bio)Hackathons
 
Resume 2009 Compatible V2 1
Resume 2009 Compatible V2 1 Resume 2009 Compatible V2 1
Resume 2009 Compatible V2 1
 

Similar a Human Guided Forests (HGF)

An online game for human phenotype prediction
An online game for human phenotype predictionAn online game for human phenotype prediction
An online game for human phenotype predictionBenjamin Good
 
Novel network pharmacology methods for drug mechanism of action identificatio...
Novel network pharmacology methods for drug mechanism of action identificatio...Novel network pharmacology methods for drug mechanism of action identificatio...
Novel network pharmacology methods for drug mechanism of action identificatio...laserxiong
 
Genevestigator
GenevestigatorGenevestigator
GenevestigatorBITS
 
BITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics dataBITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics dataBITS
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsJason Riedy
 
An interactive approach to multiobjective clustering of gene expression patterns
An interactive approach to multiobjective clustering of gene expression patternsAn interactive approach to multiobjective clustering of gene expression patterns
An interactive approach to multiobjective clustering of gene expression patternsRavi Kumar
 
Branch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiersBranch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiersBenjamin Good
 
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018David Cook
 
Curation Introduction - Apollo Workshop
Curation Introduction - Apollo WorkshopCuration Introduction - Apollo Workshop
Curation Introduction - Apollo WorkshopMonica Munoz-Torres
 
Large scale machine learning challenges for systems biology
Large scale machine learning challenges for systems biologyLarge scale machine learning challenges for systems biology
Large scale machine learning challenges for systems biologyMaté Ongenaert
 
Building Neural Network Through Neuroevolution
Building Neural Network Through NeuroevolutionBuilding Neural Network Through Neuroevolution
Building Neural Network Through Neuroevolutionbergel
 
Friend NRNB 2012-12-13
Friend NRNB 2012-12-13Friend NRNB 2012-12-13
Friend NRNB 2012-12-13Sage Base
 
Stephen Friend Cytoscape Retreat 2011-05-20
Stephen Friend Cytoscape Retreat 2011-05-20Stephen Friend Cytoscape Retreat 2011-05-20
Stephen Friend Cytoscape Retreat 2011-05-20Sage Base
 
Knowledge extraction and visualisation using rule-based machine learning
Knowledge extraction and visualisation using rule-based machine learningKnowledge extraction and visualisation using rule-based machine learning
Knowledge extraction and visualisation using rule-based machine learningjaumebp
 
Bio-inspired Active Vision System
Bio-inspired Active Vision SystemBio-inspired Active Vision System
Bio-inspired Active Vision SystemMartin Peniak
 

Similar a Human Guided Forests (HGF) (20)

An online game for human phenotype prediction
An online game for human phenotype predictionAn online game for human phenotype prediction
An online game for human phenotype prediction
 
Novel network pharmacology methods for drug mechanism of action identificatio...
Novel network pharmacology methods for drug mechanism of action identificatio...Novel network pharmacology methods for drug mechanism of action identificatio...
Novel network pharmacology methods for drug mechanism of action identificatio...
 
Genevestigator
GenevestigatorGenevestigator
Genevestigator
 
BITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics dataBITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics data
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
 
An interactive approach to multiobjective clustering of gene expression patterns
An interactive approach to multiobjective clustering of gene expression patternsAn interactive approach to multiobjective clustering of gene expression patterns
An interactive approach to multiobjective clustering of gene expression patterns
 
Branch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiersBranch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiers
 
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
 
Curation Introduction - Apollo Workshop
Curation Introduction - Apollo WorkshopCuration Introduction - Apollo Workshop
Curation Introduction - Apollo Workshop
 
Large scale machine learning challenges for systems biology
Large scale machine learning challenges for systems biologyLarge scale machine learning challenges for systems biology
Large scale machine learning challenges for systems biology
 
Content-based Image Retrieval
Content-based Image RetrievalContent-based Image Retrieval
Content-based Image Retrieval
 
Building Neural Network Through Neuroevolution
Building Neural Network Through NeuroevolutionBuilding Neural Network Through Neuroevolution
Building Neural Network Through Neuroevolution
 
Complete Human Genome Sequencing
Complete Human Genome SequencingComplete Human Genome Sequencing
Complete Human Genome Sequencing
 
Friend NRNB 2012-12-13
Friend NRNB 2012-12-13Friend NRNB 2012-12-13
Friend NRNB 2012-12-13
 
Stephen Friend Cytoscape Retreat 2011-05-20
Stephen Friend Cytoscape Retreat 2011-05-20Stephen Friend Cytoscape Retreat 2011-05-20
Stephen Friend Cytoscape Retreat 2011-05-20
 
Knowledge extraction and visualisation using rule-based machine learning
Knowledge extraction and visualisation using rule-based machine learningKnowledge extraction and visualisation using rule-based machine learning
Knowledge extraction and visualisation using rule-based machine learning
 
Bio-inspired Active Vision System
Bio-inspired Active Vision SystemBio-inspired Active Vision System
Bio-inspired Active Vision System
 
NRNB EAC Report 2011
NRNB EAC Report 2011NRNB EAC Report 2011
NRNB EAC Report 2011
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Thesis ppt
Thesis pptThesis ppt
Thesis ppt
 

Más de Benjamin Good

Representing and reasoning with biological knowledge
Representing and reasoning with biological knowledgeRepresenting and reasoning with biological knowledge
Representing and reasoning with biological knowledgeBenjamin Good
 
Integrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity ModelsIntegrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity ModelsBenjamin Good
 
Pathways2GO: Converting BioPax pathways to GO-CAMs
Pathways2GO: Converting BioPax pathways to GO-CAMsPathways2GO: Converting BioPax pathways to GO-CAMs
Pathways2GO: Converting BioPax pathways to GO-CAMsBenjamin Good
 
Building a Biomedical Knowledge Garden
Building a Biomedical Knowledge Garden Building a Biomedical Knowledge Garden
Building a Biomedical Knowledge Garden Benjamin Good
 
Wikidata and the Semantic Web of Food
Wikidata and the  Semantic Web of FoodWikidata and the  Semantic Web of Food
Wikidata and the Semantic Web of FoodBenjamin Good
 
Gene Wiki and Wikimedia Foundation SPARQL workshop
Gene Wiki and Wikimedia Foundation SPARQL workshopGene Wiki and Wikimedia Foundation SPARQL workshop
Gene Wiki and Wikimedia Foundation SPARQL workshopBenjamin Good
 
Opportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocurationOpportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocurationBenjamin Good
 
Wikidata workshop for ISB Biocuration 2016
Wikidata workshop for ISB Biocuration 2016Wikidata workshop for ISB Biocuration 2016
Wikidata workshop for ISB Biocuration 2016Benjamin Good
 
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery (Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery Benjamin Good
 
2015 6 bd2k_biobranch_knowbio
2015 6 bd2k_biobranch_knowbio2015 6 bd2k_biobranch_knowbio
2015 6 bd2k_biobranch_knowbioBenjamin Good
 
Building a massive biomedical knowledge graph with citizen science
Building a massive biomedical knowledge graph with citizen scienceBuilding a massive biomedical knowledge graph with citizen science
Building a massive biomedical knowledge graph with citizen scienceBenjamin Good
 
Serious games for bioinformatics education. ISMB 2014 education workshop
Serious games for bioinformatics education.  ISMB 2014 education workshopSerious games for bioinformatics education.  ISMB 2014 education workshop
Serious games for bioinformatics education. ISMB 2014 education workshopBenjamin Good
 
The Cure: Making a game of gene selection for breast cancer survival prediction
The Cure: Making a game of gene selection for breast cancer survival predictionThe Cure: Making a game of gene selection for breast cancer survival prediction
The Cure: Making a game of gene selection for breast cancer survival predictionBenjamin Good
 
Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...
Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...
Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...Benjamin Good
 
Microtask crowdsourcing for disease mention annotation in PubMed abstracts
Microtask crowdsourcing for disease mention annotation in PubMed abstractsMicrotask crowdsourcing for disease mention annotation in PubMed abstracts
Microtask crowdsourcing for disease mention annotation in PubMed abstractsBenjamin Good
 
Mark2Cure: a crowdsourcing platform for biomedical literature annotation
Mark2Cure: a crowdsourcing platform for biomedical literature annotationMark2Cure: a crowdsourcing platform for biomedical literature annotation
Mark2Cure: a crowdsourcing platform for biomedical literature annotationBenjamin Good
 
Short update on The Cure game first week
Short update on The Cure game first weekShort update on The Cure game first week
Short update on The Cure game first weekBenjamin Good
 

Más de Benjamin Good (19)

Representing and reasoning with biological knowledge
Representing and reasoning with biological knowledgeRepresenting and reasoning with biological knowledge
Representing and reasoning with biological knowledge
 
Integrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity ModelsIntegrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity Models
 
Pathways2GO: Converting BioPax pathways to GO-CAMs
Pathways2GO: Converting BioPax pathways to GO-CAMsPathways2GO: Converting BioPax pathways to GO-CAMs
Pathways2GO: Converting BioPax pathways to GO-CAMs
 
Knowledge Beacons
Knowledge BeaconsKnowledge Beacons
Knowledge Beacons
 
Building a Biomedical Knowledge Garden
Building a Biomedical Knowledge Garden Building a Biomedical Knowledge Garden
Building a Biomedical Knowledge Garden
 
Science Game Lab
Science Game LabScience Game Lab
Science Game Lab
 
Wikidata and the Semantic Web of Food
Wikidata and the  Semantic Web of FoodWikidata and the  Semantic Web of Food
Wikidata and the Semantic Web of Food
 
Gene Wiki and Wikimedia Foundation SPARQL workshop
Gene Wiki and Wikimedia Foundation SPARQL workshopGene Wiki and Wikimedia Foundation SPARQL workshop
Gene Wiki and Wikimedia Foundation SPARQL workshop
 
Opportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocurationOpportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocuration
 
Wikidata workshop for ISB Biocuration 2016
Wikidata workshop for ISB Biocuration 2016Wikidata workshop for ISB Biocuration 2016
Wikidata workshop for ISB Biocuration 2016
 
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery (Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
 
2015 6 bd2k_biobranch_knowbio
2015 6 bd2k_biobranch_knowbio2015 6 bd2k_biobranch_knowbio
2015 6 bd2k_biobranch_knowbio
 
Building a massive biomedical knowledge graph with citizen science
Building a massive biomedical knowledge graph with citizen scienceBuilding a massive biomedical knowledge graph with citizen science
Building a massive biomedical knowledge graph with citizen science
 
Serious games for bioinformatics education. ISMB 2014 education workshop
Serious games for bioinformatics education.  ISMB 2014 education workshopSerious games for bioinformatics education.  ISMB 2014 education workshop
Serious games for bioinformatics education. ISMB 2014 education workshop
 
The Cure: Making a game of gene selection for breast cancer survival prediction
The Cure: Making a game of gene selection for breast cancer survival predictionThe Cure: Making a game of gene selection for breast cancer survival prediction
The Cure: Making a game of gene selection for breast cancer survival prediction
 
Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...
Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...
Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...
 
Microtask crowdsourcing for disease mention annotation in PubMed abstracts
Microtask crowdsourcing for disease mention annotation in PubMed abstractsMicrotask crowdsourcing for disease mention annotation in PubMed abstracts
Microtask crowdsourcing for disease mention annotation in PubMed abstracts
 
Mark2Cure: a crowdsourcing platform for biomedical literature annotation
Mark2Cure: a crowdsourcing platform for biomedical literature annotationMark2Cure: a crowdsourcing platform for biomedical literature annotation
Mark2Cure: a crowdsourcing platform for biomedical literature annotation
 
Short update on The Cure game first week
Short update on The Cure game first weekShort update on The Cure game first week
Short update on The Cure game first week
 

Último

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 

Último (20)

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 

Human Guided Forests (HGF)

  • 1. HUMAN GUIDED FORESTS Su lab meeting March 30, 2012 Benjamin Good
  • 2. AGENDA 1. Motivation 2. Idea 3. Game
  • 3. CHALLENGE Need to build biological class predictors that: 1. Have high accuracy 2. Use relatively few variables To do this we have to use datasets that: 1. Are very noisy 2. Contain enormous numbers of variables
  • 4. EXAMPLE: BREAST CANCER PROGNOSIS Van‟tVeer 2002 Nature 98 breast cancer samples: 34 developed metastases within 5 years, 44 did not 18 had BRCA1 mutations, 2 had BRCA2 mutations expression levels of 25,000 genes measured 5,000 genes “significantly regulated across the sample groups”
  • 5. 98 tumors genes that coregulate with ER 5000 genes co-regulated genes indicating lymphocytic infiltrate 70% bad 30% bad
  • 6. METASTASIS PREDICTOR 231 genes were found to be significantly associated with disease outcome Using leave-one-out cross-validation they empirically selected the 70 best individual genes to build their predictor Of the 78 samples in the training set the predictor correctly classified 65 (83%) ->MammaPrint test from Agendia Still in clinical trials (10 years since original study) (MINDACT)
  • 7. WE CAN DO BETTER This signature does not take advantage of: • interactions between genes (together two variables may be much more predictive then either one alone) • biological knowledge (this signature leaves out several known cancer predictors and does not make use of biological knowledge in any way)
  • 8. THERE ARE MANY MANY CHALLENGES LIKE THIS IN BIOLOGY
  • 9. AGENDA 1. Motivation 2. Idea 3. Game
  • 10. WE CAN DO BETTER BY INTEGRATING MACHINE LEARNING WITH BIOLOGICAL EXPERTISE The standard signature does not take advantage of: • interactions between genes machine learning algorithms can find and use these but can have problems when faced with large feature spaces • biological knowledge can be used to guide the machine learning process towards meaningful features in the data and thus reduce chances of overfitting
  • 11. MACHINE LEARNING ALGORITHM OF THE MOMENT • In each of many iterations, a small subset of features are chosen randomly and used to build one decision tree • Decision trees are stored and classifications are made based on the majority vote of all of the trees. • Good classifier! • But you get different forests every time you run it and it faces the same challenges of generalizability as any other learning algorithm.
  • 12. NETWORK GUIDED FOREST (NGF) Same algorithm except each tree is constructed from a particular area of a relevant protein-protein interaction network. 1) Pick a gene randomly 2) Walk out along the network to get the other N genes to use to build that tree 3) repeat The premise is that biologically coherent modules will give better signal than individual genes randomly grouped together Dutkowski & Ideker (2011) Protein Networks as Logic Functions in Development in Development and Cancer. PLoS Computational Biology
  • 13. NGF RESULTS A) Identical performance to random forest and random network guided forest as assessed by 5 fold cross-validation repeated 100 times. B) More known breast cancer genes show up in the forest C) Similar genes selected for forests in two different training sets (different patient cohorts)
  • 14. HUMAN GUIDED RANDOM FOREST (HGF) Same algorithm again except each trees are constructed from a manually selected subset of genes (or other features). 1) Find a person 2) Let them select what they think is an optimal feature set 3) back to step one, N times 4) aggregate The premise is that biological knowledge can produce better than random decision modules and that not all biological knowledge is captured in interaction networks
  • 15. HGF CHALLENGES 1) Find a person 2) Let them select what they think is an optimal feature set 3) back to step one, N times • N may be large (e.g. 1,000) • Need many knowledgeable people to work hard... for free
  • 16. AGENDA 1. Motivation 2. Idea 3. Game
  • 17. COMBO! COMPONENTS Game interface(s) Java server server provides • manages game events: • hosts training data features for games users, moves, etc. • executes decision (e.g. gene cards) • two implementations so tree algorithm far • performs cross- • server side (JSP) card validation tests client sends groups game • logs data generated of features („hands‟) • client side (javascript) by game to the server for table game scoring.
  • 18. COMBO CODE AND DEMO Next steps 1. Better preprocessing of training data • map contigs to genes where possible, filter out clearly useless genes • identify individually predictive genes • Game • build domain-specific boards, let players pick their knowledge area • Real two player feeling with robot partner • Special cards: robber card, any-gene selector card • High scores • ??????????????

Notas del editor

  1. " at least a twofold difference and a P-value of less than 0.01 in more than five tumours”
  2. a, Two-dimensional presentation of transcript ratios for 98 breast tumours. There were 4,968 significant genes across the group. Each row represents a tumour and each column a single gene. As shown in the colour bar, red indicates upregulation, green downregulation, black no change, and grey no data available. The yellow line marks the subdivision into two dominant tumour clusters. b, Selected clinical data for the 98 patients in a: BRCA1 germline mutation carrier (or sporadic patient), ER expression, tumour grade 3 (versus grade 1 and 2), lymphocytic infiltrate, angioinvasion, and metastasis status. White indicates positive, black negative and grey denotes tumours derived from BRCA1 germline carriers who were excluded from the metastasis evaluation. The cluster below the yellow line consists of 36 tumours, of which 34 are ER negative (total 39 ER-negative) and 16 are carriers of the BRCA1 mutation (total 18). c, Enlarged portion from a containing a group of genes that co-regulate with the ER- gene (ESR1). Each gene is labelled by its gene name or accession number from GenBank. Contig ESTs ending with RC are reverse-complementary of the named contig EST. d, Enlarged portion from a containing a group of co-regulated genes that are the molecular reflection of extensive lymphocytic infiltrate, and comprise a set of genes expressed in T and B cells. (Gene annotation as in c.)
  3. (A) Average area under the ROC curve for NGF, RF, NGF applied to permuted networks (NGF**), and Naïve Bayes, compared to reported scores for representative previous methods (error bars denote standard deviation estimated over 100 runs). (B) General cancer and breast cancer associated genes identified among the 100 top-scoring genes or 100 most abundant genes in the forest created using RF or NGF. using the real network or networks with permuted edges (average over 100 permutations is shown). (C) Genes ranked by their importance for classification in two independent breast cancer patient cohorts (y vs. x axis). Network-Guided Forest, blue points; regular Random Forest, green points.