SlideShare a Scribd company logo
1 of 85
Using networks to derive function Lars Juhl Jensen
STRING
Jensen, Kuhn et al.,  Nucleic Acids Research , 2009
functional associations
Frishman et al.,  Modern Genome Annotation , 2009
common basis
630 genomes
model organism databases
Ensembl
RefSeq
genomic context methods
gene fusion
Korbel et al.,  Nature Biotechnology , 2004
conserved neighborhood
operons
Korbel et al.,  Nature Biotechnology , 2004
bidirectional promoters
Korbel et al.,  Nature Biotechnology , 2004
phylogenetic profiles
Korbel et al.,  Nature Biotechnology , 2004
primary experimental data
protein interactions
yeast two-hybrid
affinity purification
fragment complementation
Jensen & Bork,  Science , 2008
genetic interactions
Beyer et al.,  Nature Reviews Genetics , 2007
BIND Biomolecular Interaction Network Database
BioGRID General Repository for Interaction Datasets
DIP Database of Interacting Proteins
IntAct
MINT Molecular Interactions Database
HPRD Human Protein Reference Database
PDB Protein Data Bank
inferred associations
gene coexpression
 
GEO Gene Expression Omnibus
expression compendia
curated knowledge
complexes
MIPS Munich Information center for Protein Sequences
Gene Ontology
pathways
Letunic & Bork,  Trends in Biochemical Sciences , 2008
KEGG Kyoto Encyclopedia of Genes and Genomes
MetaCyc
Reactome
PID NCI-Nature Pathway Interaction Database
literature mining
M EDLINE
SGD Saccharomyces Genome Database
The Interactive Fly
OMIM Online Mendelian Inheritance in Man
co-mentioning
statistical methods
NLP Natural Language Processing
[object Object],[object Object],[object Object],[object Object],[object Object]
 
easy in theory …
…  but not in practice
many data types
not comparable
variable quality
many sources
different file formats
different gene identifiers
partially redundant
spread over 630 genomes
quality scores
reproducibility
von Mering et al.,  Nucleic Acids Research , 2005
benchmarking
von Mering et al.,  Nucleic Acids Research , 2005
orthology
von Mering et al.,  Nucleic Acids Research , 2005
two modes
COG mode
von Mering et al.,  Nucleic Acids Research , 2005
protein mode
von Mering et al.,  Nucleic Acids Research , 2005
combine all evidence
Frishman et al.,  Modern Genome Annotation , 2009
Acknowledgments ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

More Related Content

What's hot

Network Biology: Large-scale integration of data and text
Network Biology: Large-scale integration of data and textNetwork Biology: Large-scale integration of data and text
Network Biology: Large-scale integration of data and textLars Juhl Jensen
 
Gene association networks - Large-scale integration of data and text
Gene association networks - Large-scale integration of data and textGene association networks - Large-scale integration of data and text
Gene association networks - Large-scale integration of data and textLars Juhl Jensen
 
Network Biology: A crash course on STRING and Cytoscape
Network Biology: A crash course on STRING and CytoscapeNetwork Biology: A crash course on STRING and Cytoscape
Network Biology: A crash course on STRING and CytoscapeLars Juhl Jensen
 
STRING: Protein networks from data and text mining
STRING: Protein networks from data and text miningSTRING: Protein networks from data and text mining
STRING: Protein networks from data and text miningLars Juhl Jensen
 
From phosphoproteomics to signaling networks
From phosphoproteomics to signaling networksFrom phosphoproteomics to signaling networks
From phosphoproteomics to signaling networksLars Juhl Jensen
 
Gene association networks - Large-scale integration of data and text
Gene association networks - Large-scale integration of data and textGene association networks - Large-scale integration of data and text
Gene association networks - Large-scale integration of data and textLars Juhl Jensen
 
Large-scale data and text mining
Large-scale data and text miningLarge-scale data and text mining
Large-scale data and text miningLars Juhl Jensen
 
Integration of diverse large-scale datasets
Integration of diverse large-scale datasetsIntegration of diverse large-scale datasets
Integration of diverse large-scale datasetsLars Juhl Jensen
 
Protein association networks: Large-scale integration of data and text
Protein association networks: Large-scale integration of data and textProtein association networks: Large-scale integration of data and text
Protein association networks: Large-scale integration of data and textLars Juhl Jensen
 
Network biology: Large-scale data and text mining
Network biology: Large-scale data and text miningNetwork biology: Large-scale data and text mining
Network biology: Large-scale data and text miningLars Juhl Jensen
 
STRING - Large-scale integration of data and text
STRING - Large-scale integration of data and textSTRING - Large-scale integration of data and text
STRING - Large-scale integration of data and textLars Juhl Jensen
 
Unraveling signal transduction networks through data integration
Unraveling signal transduction networks through data integrationUnraveling signal transduction networks through data integration
Unraveling signal transduction networks through data integrationLars Juhl Jensen
 
One tagger, many uses - Illustrating the power of ontologies in named entity ...
One tagger, many uses - Illustrating the power of ontologies in named entity ...One tagger, many uses - Illustrating the power of ontologies in named entity ...
One tagger, many uses - Illustrating the power of ontologies in named entity ...Lars Juhl Jensen
 
Gene association networks - Large-scale integration of data and text
Gene association networks - Large-scale integration of data and textGene association networks - Large-scale integration of data and text
Gene association networks - Large-scale integration of data and textLars Juhl Jensen
 
Unraveling cellular phosphorylation networks using computational biology
Unraveling cellular phosphorylation networks using computational biologyUnraveling cellular phosphorylation networks using computational biology
Unraveling cellular phosphorylation networks using computational biologyLars Juhl Jensen
 
Gene association networks: Large-scale integration of data and text
Gene association networks: Large-scale integration of data and textGene association networks: Large-scale integration of data and text
Gene association networks: Large-scale integration of data and textLars Juhl Jensen
 
Gene association networks: Large-scale integration of data and text
Gene association networks: Large-scale integration of data and textGene association networks: Large-scale integration of data and text
Gene association networks: Large-scale integration of data and textLars Juhl Jensen
 
STRING/STITCH tutorial
STRING/STITCH tutorialSTRING/STITCH tutorial
STRING/STITCH tutorialbiocs
 

What's hot (20)

Network Biology: Large-scale integration of data and text
Network Biology: Large-scale integration of data and textNetwork Biology: Large-scale integration of data and text
Network Biology: Large-scale integration of data and text
 
Gene association networks - Large-scale integration of data and text
Gene association networks - Large-scale integration of data and textGene association networks - Large-scale integration of data and text
Gene association networks - Large-scale integration of data and text
 
Introduction to STRING
Introduction to STRINGIntroduction to STRING
Introduction to STRING
 
Network Biology: A crash course on STRING and Cytoscape
Network Biology: A crash course on STRING and CytoscapeNetwork Biology: A crash course on STRING and Cytoscape
Network Biology: A crash course on STRING and Cytoscape
 
STRING: Protein networks from data and text mining
STRING: Protein networks from data and text miningSTRING: Protein networks from data and text mining
STRING: Protein networks from data and text mining
 
From phosphoproteomics to signaling networks
From phosphoproteomics to signaling networksFrom phosphoproteomics to signaling networks
From phosphoproteomics to signaling networks
 
Gene association networks - Large-scale integration of data and text
Gene association networks - Large-scale integration of data and textGene association networks - Large-scale integration of data and text
Gene association networks - Large-scale integration of data and text
 
Large-scale data and text mining
Large-scale data and text miningLarge-scale data and text mining
Large-scale data and text mining
 
Integration of diverse large-scale datasets
Integration of diverse large-scale datasetsIntegration of diverse large-scale datasets
Integration of diverse large-scale datasets
 
Protein association networks: Large-scale integration of data and text
Protein association networks: Large-scale integration of data and textProtein association networks: Large-scale integration of data and text
Protein association networks: Large-scale integration of data and text
 
Data and Text Mining
Data and Text MiningData and Text Mining
Data and Text Mining
 
Network biology: Large-scale data and text mining
Network biology: Large-scale data and text miningNetwork biology: Large-scale data and text mining
Network biology: Large-scale data and text mining
 
STRING - Large-scale integration of data and text
STRING - Large-scale integration of data and textSTRING - Large-scale integration of data and text
STRING - Large-scale integration of data and text
 
Unraveling signal transduction networks through data integration
Unraveling signal transduction networks through data integrationUnraveling signal transduction networks through data integration
Unraveling signal transduction networks through data integration
 
One tagger, many uses - Illustrating the power of ontologies in named entity ...
One tagger, many uses - Illustrating the power of ontologies in named entity ...One tagger, many uses - Illustrating the power of ontologies in named entity ...
One tagger, many uses - Illustrating the power of ontologies in named entity ...
 
Gene association networks - Large-scale integration of data and text
Gene association networks - Large-scale integration of data and textGene association networks - Large-scale integration of data and text
Gene association networks - Large-scale integration of data and text
 
Unraveling cellular phosphorylation networks using computational biology
Unraveling cellular phosphorylation networks using computational biologyUnraveling cellular phosphorylation networks using computational biology
Unraveling cellular phosphorylation networks using computational biology
 
Gene association networks: Large-scale integration of data and text
Gene association networks: Large-scale integration of data and textGene association networks: Large-scale integration of data and text
Gene association networks: Large-scale integration of data and text
 
Gene association networks: Large-scale integration of data and text
Gene association networks: Large-scale integration of data and textGene association networks: Large-scale integration of data and text
Gene association networks: Large-scale integration of data and text
 
STRING/STITCH tutorial
STRING/STITCH tutorialSTRING/STITCH tutorial
STRING/STITCH tutorial
 

Similar to Using networks to derive function from multiple data sources

The STITCH and Reflect web resources
The STITCH and Reflect web resourcesThe STITCH and Reflect web resources
The STITCH and Reflect web resourcesLars Juhl Jensen
 
Integration of heterogeneous data
Integration of heterogeneous dataIntegration of heterogeneous data
Integration of heterogeneous dataLars Juhl Jensen
 
Cross-species data integration
Cross-species data integrationCross-species data integration
Cross-species data integrationLars Juhl Jensen
 
Protein interaction networks
Protein interaction networksProtein interaction networks
Protein interaction networksLars Juhl Jensen
 
Unraveling signaling networks by data integration
Unraveling signaling networks by data integrationUnraveling signaling networks by data integration
Unraveling signaling networks by data integrationLars Juhl Jensen
 
Large-scale integration of data and text
Large-scale integration of data and textLarge-scale integration of data and text
Large-scale integration of data and textLars Juhl Jensen
 
The STRING database and related tools
The STRING database and related toolsThe STRING database and related tools
The STRING database and related toolsLars Juhl Jensen
 
Large-scale integration of data and text
Large-scale integration of data and textLarge-scale integration of data and text
Large-scale integration of data and textLars Juhl Jensen
 
Large-scale integration of data and text
Large-scale integration of data and textLarge-scale integration of data and text
Large-scale integration of data and textLars Juhl Jensen
 
Data integration: The STITCH database of protein-small molecule interactions
Data integration: The STITCH database of protein-small molecule interactionsData integration: The STITCH database of protein-small molecule interactions
Data integration: The STITCH database of protein-small molecule interactionsLars Juhl Jensen
 
Systems biology: Bioinformatics on complete biological system
Systems biology: Bioinformatics on complete biological systemSystems biology: Bioinformatics on complete biological system
Systems biology: Bioinformatics on complete biological systemLars Juhl Jensen
 
Large-scale data and text mining
Large-scale data and text miningLarge-scale data and text mining
Large-scale data and text miningLars Juhl Jensen
 
Network biology: Large-scale data and text mining
Network biology: Large-scale data and text miningNetwork biology: Large-scale data and text mining
Network biology: Large-scale data and text miningLars Juhl Jensen
 
Functional association networks - The STRING and STITCH web resources
Functional association networks - The STRING and STITCH web resourcesFunctional association networks - The STRING and STITCH web resources
Functional association networks - The STRING and STITCH web resourcesLars Juhl Jensen
 
Systems biology: Bioinformatics on complete biological systems
Systems biology: Bioinformatics on complete biological systemsSystems biology: Bioinformatics on complete biological systems
Systems biology: Bioinformatics on complete biological systemsLars Juhl Jensen
 
Large-scale integration of data and text
Large-scale integration of data and textLarge-scale integration of data and text
Large-scale integration of data and textLars Juhl Jensen
 
Advanced bioinformatics of proteomics datasets
Advanced bioinformaticsof proteomics datasetsAdvanced bioinformaticsof proteomics datasets
Advanced bioinformatics of proteomics datasetsLars Juhl Jensen
 

Similar to Using networks to derive function from multiple data sources (20)

The STRING database
The STRING databaseThe STRING database
The STRING database
 
The STITCH and Reflect web resources
The STITCH and Reflect web resourcesThe STITCH and Reflect web resources
The STITCH and Reflect web resources
 
Integration of heterogeneous data
Integration of heterogeneous dataIntegration of heterogeneous data
Integration of heterogeneous data
 
Cross-species data integration
Cross-species data integrationCross-species data integration
Cross-species data integration
 
Information integration
Information integrationInformation integration
Information integration
 
Protein interaction networks
Protein interaction networksProtein interaction networks
Protein interaction networks
 
Unraveling signaling networks by data integration
Unraveling signaling networks by data integrationUnraveling signaling networks by data integration
Unraveling signaling networks by data integration
 
Large-scale integration of data and text
Large-scale integration of data and textLarge-scale integration of data and text
Large-scale integration of data and text
 
The STRING database and related tools
The STRING database and related toolsThe STRING database and related tools
The STRING database and related tools
 
Large-scale integration of data and text
Large-scale integration of data and textLarge-scale integration of data and text
Large-scale integration of data and text
 
Large-scale integration of data and text
Large-scale integration of data and textLarge-scale integration of data and text
Large-scale integration of data and text
 
Network biology
Network biologyNetwork biology
Network biology
 
Data integration: The STITCH database of protein-small molecule interactions
Data integration: The STITCH database of protein-small molecule interactionsData integration: The STITCH database of protein-small molecule interactions
Data integration: The STITCH database of protein-small molecule interactions
 
Systems biology: Bioinformatics on complete biological system
Systems biology: Bioinformatics on complete biological systemSystems biology: Bioinformatics on complete biological system
Systems biology: Bioinformatics on complete biological system
 
Large-scale data and text mining
Large-scale data and text miningLarge-scale data and text mining
Large-scale data and text mining
 
Network biology: Large-scale data and text mining
Network biology: Large-scale data and text miningNetwork biology: Large-scale data and text mining
Network biology: Large-scale data and text mining
 
Functional association networks - The STRING and STITCH web resources
Functional association networks - The STRING and STITCH web resourcesFunctional association networks - The STRING and STITCH web resources
Functional association networks - The STRING and STITCH web resources
 
Systems biology: Bioinformatics on complete biological systems
Systems biology: Bioinformatics on complete biological systemsSystems biology: Bioinformatics on complete biological systems
Systems biology: Bioinformatics on complete biological systems
 
Large-scale integration of data and text
Large-scale integration of data and textLarge-scale integration of data and text
Large-scale integration of data and text
 
Advanced bioinformatics of proteomics datasets
Advanced bioinformaticsof proteomics datasetsAdvanced bioinformaticsof proteomics datasets
Advanced bioinformatics of proteomics datasets
 

More from Lars Juhl Jensen

One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Illustrating the power of dictionary-based named entit...One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Illustrating the power of dictionary-based named entit...Lars Juhl Jensen
 
One tagger, many uses: Simple text-mining strategies for biomedicine
One tagger, many uses: Simple text-mining strategies for biomedicineOne tagger, many uses: Simple text-mining strategies for biomedicine
One tagger, many uses: Simple text-mining strategies for biomedicineLars Juhl Jensen
 
Extract 2.0: Text-mining-assisted interactive annotation
Extract 2.0: Text-mining-assisted interactive annotationExtract 2.0: Text-mining-assisted interactive annotation
Extract 2.0: Text-mining-assisted interactive annotationLars Juhl Jensen
 
Network visualization: A crash course on using Cytoscape
Network visualization: A crash course on using CytoscapeNetwork visualization: A crash course on using Cytoscape
Network visualization: A crash course on using CytoscapeLars Juhl Jensen
 
STRING & STITCH : Network integration of heterogeneous data
STRING & STITCH: Network integration of heterogeneous dataSTRING & STITCH: Network integration of heterogeneous data
STRING & STITCH : Network integration of heterogeneous dataLars Juhl Jensen
 
Biomedical text mining: Automatic processing of unstructured text
Biomedical text mining: Automatic processing of unstructured textBiomedical text mining: Automatic processing of unstructured text
Biomedical text mining: Automatic processing of unstructured textLars Juhl Jensen
 
Medical network analysis: Linking diseases and genes through data and text mi...
Medical network analysis: Linking diseases and genes through data and text mi...Medical network analysis: Linking diseases and genes through data and text mi...
Medical network analysis: Linking diseases and genes through data and text mi...Lars Juhl Jensen
 
Cellular Network Biology: Large-scale integration of data and text
Cellular Network Biology: Large-scale integration of data and textCellular Network Biology: Large-scale integration of data and text
Cellular Network Biology: Large-scale integration of data and textLars Juhl Jensen
 
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...Lars Juhl Jensen
 
STRING & related databases: Large-scale integration of heterogeneous data
STRING & related databases: Large-scale integration of heterogeneous dataSTRING & related databases: Large-scale integration of heterogeneous data
STRING & related databases: Large-scale integration of heterogeneous dataLars Juhl Jensen
 
Tagger: Rapid dictionary-based named entity recognition
Tagger: Rapid dictionary-based named entity recognitionTagger: Rapid dictionary-based named entity recognition
Tagger: Rapid dictionary-based named entity recognitionLars Juhl Jensen
 
Medical text mining: Linking diseases, drugs, and adverse reactions
Medical text mining: Linking diseases, drugs, and adverse reactionsMedical text mining: Linking diseases, drugs, and adverse reactions
Medical text mining: Linking diseases, drugs, and adverse reactionsLars Juhl Jensen
 
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and textNetwork biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and textLars Juhl Jensen
 
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactionsMedical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactionsLars Juhl Jensen
 
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and textNetwork biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and textLars Juhl Jensen
 
Biomarker bioinformatics: Network-based candidate prioritization
Biomarker bioinformatics: Network-based candidate prioritizationBiomarker bioinformatics: Network-based candidate prioritization
Biomarker bioinformatics: Network-based candidate prioritizationLars Juhl Jensen
 
The Art of Counting: Scoring and ranking co-occurrences in literature
The Art of Counting: Scoring and ranking co-occurrences in literatureThe Art of Counting: Scoring and ranking co-occurrences in literature
The Art of Counting: Scoring and ranking co-occurrences in literatureLars Juhl Jensen
 
Text-mining-based retrieval of protein networks
Text-mining-based retrieval of protein networksText-mining-based retrieval of protein networks
Text-mining-based retrieval of protein networksLars Juhl Jensen
 

More from Lars Juhl Jensen (20)

One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Illustrating the power of dictionary-based named entit...One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Illustrating the power of dictionary-based named entit...
 
One tagger, many uses: Simple text-mining strategies for biomedicine
One tagger, many uses: Simple text-mining strategies for biomedicineOne tagger, many uses: Simple text-mining strategies for biomedicine
One tagger, many uses: Simple text-mining strategies for biomedicine
 
Extract 2.0: Text-mining-assisted interactive annotation
Extract 2.0: Text-mining-assisted interactive annotationExtract 2.0: Text-mining-assisted interactive annotation
Extract 2.0: Text-mining-assisted interactive annotation
 
Network visualization: A crash course on using Cytoscape
Network visualization: A crash course on using CytoscapeNetwork visualization: A crash course on using Cytoscape
Network visualization: A crash course on using Cytoscape
 
STRING & STITCH : Network integration of heterogeneous data
STRING & STITCH: Network integration of heterogeneous dataSTRING & STITCH: Network integration of heterogeneous data
STRING & STITCH : Network integration of heterogeneous data
 
Biomedical text mining: Automatic processing of unstructured text
Biomedical text mining: Automatic processing of unstructured textBiomedical text mining: Automatic processing of unstructured text
Biomedical text mining: Automatic processing of unstructured text
 
Medical network analysis: Linking diseases and genes through data and text mi...
Medical network analysis: Linking diseases and genes through data and text mi...Medical network analysis: Linking diseases and genes through data and text mi...
Medical network analysis: Linking diseases and genes through data and text mi...
 
Cellular networks
Cellular networksCellular networks
Cellular networks
 
Cellular Network Biology: Large-scale integration of data and text
Cellular Network Biology: Large-scale integration of data and textCellular Network Biology: Large-scale integration of data and text
Cellular Network Biology: Large-scale integration of data and text
 
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
 
STRING & related databases: Large-scale integration of heterogeneous data
STRING & related databases: Large-scale integration of heterogeneous dataSTRING & related databases: Large-scale integration of heterogeneous data
STRING & related databases: Large-scale integration of heterogeneous data
 
Tagger: Rapid dictionary-based named entity recognition
Tagger: Rapid dictionary-based named entity recognitionTagger: Rapid dictionary-based named entity recognition
Tagger: Rapid dictionary-based named entity recognition
 
Medical text mining: Linking diseases, drugs, and adverse reactions
Medical text mining: Linking diseases, drugs, and adverse reactionsMedical text mining: Linking diseases, drugs, and adverse reactions
Medical text mining: Linking diseases, drugs, and adverse reactions
 
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and textNetwork biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and text
 
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactionsMedical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactions
 
Cellular Network Biology
Cellular Network BiologyCellular Network Biology
Cellular Network Biology
 
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and textNetwork biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and text
 
Biomarker bioinformatics: Network-based candidate prioritization
Biomarker bioinformatics: Network-based candidate prioritizationBiomarker bioinformatics: Network-based candidate prioritization
Biomarker bioinformatics: Network-based candidate prioritization
 
The Art of Counting: Scoring and ranking co-occurrences in literature
The Art of Counting: Scoring and ranking co-occurrences in literatureThe Art of Counting: Scoring and ranking co-occurrences in literature
The Art of Counting: Scoring and ranking co-occurrences in literature
 
Text-mining-based retrieval of protein networks
Text-mining-based retrieval of protein networksText-mining-based retrieval of protein networks
Text-mining-based retrieval of protein networks
 

Using networks to derive function from multiple data sources