SlideShare una empresa de Scribd logo
1 de 85
Reflect and friends
Tools and resources for mining biomedical text




>10 km



               Lars Juhl Jensen
exponential growth
some things are constant
~45 seconds per paper
computer
as smart as a dog
teach it specific tricks
named entity recognition
identify the concepts
comprehensive lexicon
small molecules
proteins
cellular components
tissues
organisms
phenotypes
diseases
orthographic variation
singular vs. plural
flexible matching
spaces and hyphens
conflict resolution
“black list”
scalable implementation
>10 km
<10 hours
augmented browsing
Reflect
reflect.ws
Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009
            O’Donoghue et al., Journal of Web Semantics, 2010
browser add-on
Firefox
Google Chrome
Safari
Internet Explorer
PDF viewer
Utopia Documents
collaborate with publishers
SciVerse application
corpora
~20 mio. abstracts
1.8 mio. full-text articles
collaborate with publishers
web-centric databases
guilt by association
information extraction
co-mentioning
STRING
string-db.org
Szklarczyk, Franceschini et al., Nucleic Acids Research, 2011
~2.6 million proteins
STITCH
stitch-db.org
Kuhn et al., Nucleic Acids Research, 2012
~300,000 small molecules
COMPARTMENTS
subcellular localization
compartments.jensenlab.org
search for proteins
DISEASES
>8,000 disease terms
diseases.jensenlab.org
search for diseases
search for proteins
evidence viewers
disease behavior
Thank you!
Reflect                STRING and STITCH      COMPARTMENT
Sune Frankild          Damian Szklarczyk      S
                                              and DISEASES
Heiko Horn             Andrea Franceschini
                                              Sune Frankild
Evangelos Pafilis      Michael Kuhn
                                              Janos Binder
Janos Binder           Milan Simonovic
Reinhardt Schneider    Alexander Roth
Sean O’Donoghue        Pablo Minguez
                       Tobias Doerks
                       Manuel Stark
                       Jean Muller
                       Peer Bork
                       Christian von Mering

Más contenido relacionado

Similar a Reflect and friends - Tools and resources for mining biomedical text

Advanced bioinformatics methods for proteomics
Advanced bioinformatics methods for proteomicsAdvanced bioinformatics methods for proteomics
Advanced bioinformatics methods for proteomicsLars Juhl Jensen
 
Advanced bioinformatics methods for proteomics
Advanced bioinformatics methods for proteomicsAdvanced bioinformatics methods for proteomics
Advanced bioinformatics methods for proteomicsLars Juhl Jensen
 
Systems biology - Bioinformatics on complete biological systems
Systems biology - Bioinformatics on complete biological systemsSystems biology - Bioinformatics on complete biological systems
Systems biology - Bioinformatics on complete biological systemsLars Juhl Jensen
 
Advanced bioinformatics methods for proteomics
Advanced bioinformatics methods for proteomicsAdvanced bioinformatics methods for proteomics
Advanced bioinformatics methods for proteomicsLars Juhl Jensen
 
Large-scale integration of data and text
Large-scale integration of data and textLarge-scale integration of data and text
Large-scale integration of data and textLars Juhl Jensen
 
Mining heaps of data and piles of papers
Mining heaps of data and piles of papersMining heaps of data and piles of papers
Mining heaps of data and piles of papersLars Juhl Jensen
 
Mining literature and medical records
Mining literature and medical recordsMining literature and medical records
Mining literature and medical recordsLars Juhl Jensen
 

Similar a Reflect and friends - Tools and resources for mining biomedical text (8)

Advanced bioinformatics methods for proteomics
Advanced bioinformatics methods for proteomicsAdvanced bioinformatics methods for proteomics
Advanced bioinformatics methods for proteomics
 
Advanced bioinformatics methods for proteomics
Advanced bioinformatics methods for proteomicsAdvanced bioinformatics methods for proteomics
Advanced bioinformatics methods for proteomics
 
Systems biology - Bioinformatics on complete biological systems
Systems biology - Bioinformatics on complete biological systemsSystems biology - Bioinformatics on complete biological systems
Systems biology - Bioinformatics on complete biological systems
 
Advanced bioinformatics methods for proteomics
Advanced bioinformatics methods for proteomicsAdvanced bioinformatics methods for proteomics
Advanced bioinformatics methods for proteomics
 
Large-scale integration of data and text
Large-scale integration of data and textLarge-scale integration of data and text
Large-scale integration of data and text
 
Mining heaps of data and piles of papers
Mining heaps of data and piles of papersMining heaps of data and piles of papers
Mining heaps of data and piles of papers
 
Mining biomedical texts
Mining biomedical textsMining biomedical texts
Mining biomedical texts
 
Mining literature and medical records
Mining literature and medical recordsMining literature and medical records
Mining literature and medical records
 

Más de Lars Juhl Jensen

One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Illustrating the power of dictionary-based named entit...One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Illustrating the power of dictionary-based named entit...Lars Juhl Jensen
 
One tagger, many uses: Simple text-mining strategies for biomedicine
One tagger, many uses: Simple text-mining strategies for biomedicineOne tagger, many uses: Simple text-mining strategies for biomedicine
One tagger, many uses: Simple text-mining strategies for biomedicineLars Juhl Jensen
 
Extract 2.0: Text-mining-assisted interactive annotation
Extract 2.0: Text-mining-assisted interactive annotationExtract 2.0: Text-mining-assisted interactive annotation
Extract 2.0: Text-mining-assisted interactive annotationLars Juhl Jensen
 
Network visualization: A crash course on using Cytoscape
Network visualization: A crash course on using CytoscapeNetwork visualization: A crash course on using Cytoscape
Network visualization: A crash course on using CytoscapeLars Juhl Jensen
 
STRING & STITCH : Network integration of heterogeneous data
STRING & STITCH: Network integration of heterogeneous dataSTRING & STITCH: Network integration of heterogeneous data
STRING & STITCH : Network integration of heterogeneous dataLars Juhl Jensen
 
Biomedical text mining: Automatic processing of unstructured text
Biomedical text mining: Automatic processing of unstructured textBiomedical text mining: Automatic processing of unstructured text
Biomedical text mining: Automatic processing of unstructured textLars Juhl Jensen
 
Medical network analysis: Linking diseases and genes through data and text mi...
Medical network analysis: Linking diseases and genes through data and text mi...Medical network analysis: Linking diseases and genes through data and text mi...
Medical network analysis: Linking diseases and genes through data and text mi...Lars Juhl Jensen
 
Network Biology: A crash course on STRING and Cytoscape
Network Biology: A crash course on STRING and CytoscapeNetwork Biology: A crash course on STRING and Cytoscape
Network Biology: A crash course on STRING and CytoscapeLars Juhl Jensen
 
Cellular Network Biology: Large-scale integration of data and text
Cellular Network Biology: Large-scale integration of data and textCellular Network Biology: Large-scale integration of data and text
Cellular Network Biology: Large-scale integration of data and textLars Juhl Jensen
 
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...Lars Juhl Jensen
 
STRING & related databases: Large-scale integration of heterogeneous data
STRING & related databases: Large-scale integration of heterogeneous dataSTRING & related databases: Large-scale integration of heterogeneous data
STRING & related databases: Large-scale integration of heterogeneous dataLars Juhl Jensen
 
Tagger: Rapid dictionary-based named entity recognition
Tagger: Rapid dictionary-based named entity recognitionTagger: Rapid dictionary-based named entity recognition
Tagger: Rapid dictionary-based named entity recognitionLars Juhl Jensen
 
Network Biology: Large-scale integration of data and text
Network Biology: Large-scale integration of data and textNetwork Biology: Large-scale integration of data and text
Network Biology: Large-scale integration of data and textLars Juhl Jensen
 
Medical text mining: Linking diseases, drugs, and adverse reactions
Medical text mining: Linking diseases, drugs, and adverse reactionsMedical text mining: Linking diseases, drugs, and adverse reactions
Medical text mining: Linking diseases, drugs, and adverse reactionsLars Juhl Jensen
 
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and textNetwork biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and textLars Juhl Jensen
 
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactionsMedical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactionsLars Juhl Jensen
 
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and textNetwork biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and textLars Juhl Jensen
 
Biomarker bioinformatics: Network-based candidate prioritization
Biomarker bioinformatics: Network-based candidate prioritizationBiomarker bioinformatics: Network-based candidate prioritization
Biomarker bioinformatics: Network-based candidate prioritizationLars Juhl Jensen
 

Más de Lars Juhl Jensen (20)

One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Illustrating the power of dictionary-based named entit...One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Illustrating the power of dictionary-based named entit...
 
One tagger, many uses: Simple text-mining strategies for biomedicine
One tagger, many uses: Simple text-mining strategies for biomedicineOne tagger, many uses: Simple text-mining strategies for biomedicine
One tagger, many uses: Simple text-mining strategies for biomedicine
 
Extract 2.0: Text-mining-assisted interactive annotation
Extract 2.0: Text-mining-assisted interactive annotationExtract 2.0: Text-mining-assisted interactive annotation
Extract 2.0: Text-mining-assisted interactive annotation
 
Network visualization: A crash course on using Cytoscape
Network visualization: A crash course on using CytoscapeNetwork visualization: A crash course on using Cytoscape
Network visualization: A crash course on using Cytoscape
 
STRING & STITCH : Network integration of heterogeneous data
STRING & STITCH: Network integration of heterogeneous dataSTRING & STITCH: Network integration of heterogeneous data
STRING & STITCH : Network integration of heterogeneous data
 
Biomedical text mining: Automatic processing of unstructured text
Biomedical text mining: Automatic processing of unstructured textBiomedical text mining: Automatic processing of unstructured text
Biomedical text mining: Automatic processing of unstructured text
 
Medical network analysis: Linking diseases and genes through data and text mi...
Medical network analysis: Linking diseases and genes through data and text mi...Medical network analysis: Linking diseases and genes through data and text mi...
Medical network analysis: Linking diseases and genes through data and text mi...
 
Network Biology: A crash course on STRING and Cytoscape
Network Biology: A crash course on STRING and CytoscapeNetwork Biology: A crash course on STRING and Cytoscape
Network Biology: A crash course on STRING and Cytoscape
 
Cellular networks
Cellular networksCellular networks
Cellular networks
 
Cellular Network Biology: Large-scale integration of data and text
Cellular Network Biology: Large-scale integration of data and textCellular Network Biology: Large-scale integration of data and text
Cellular Network Biology: Large-scale integration of data and text
 
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
 
STRING & related databases: Large-scale integration of heterogeneous data
STRING & related databases: Large-scale integration of heterogeneous dataSTRING & related databases: Large-scale integration of heterogeneous data
STRING & related databases: Large-scale integration of heterogeneous data
 
Tagger: Rapid dictionary-based named entity recognition
Tagger: Rapid dictionary-based named entity recognitionTagger: Rapid dictionary-based named entity recognition
Tagger: Rapid dictionary-based named entity recognition
 
Network Biology: Large-scale integration of data and text
Network Biology: Large-scale integration of data and textNetwork Biology: Large-scale integration of data and text
Network Biology: Large-scale integration of data and text
 
Medical text mining: Linking diseases, drugs, and adverse reactions
Medical text mining: Linking diseases, drugs, and adverse reactionsMedical text mining: Linking diseases, drugs, and adverse reactions
Medical text mining: Linking diseases, drugs, and adverse reactions
 
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and textNetwork biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and text
 
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactionsMedical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactions
 
Cellular Network Biology
Cellular Network BiologyCellular Network Biology
Cellular Network Biology
 
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and textNetwork biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and text
 
Biomarker bioinformatics: Network-based candidate prioritization
Biomarker bioinformatics: Network-based candidate prioritizationBiomarker bioinformatics: Network-based candidate prioritization
Biomarker bioinformatics: Network-based candidate prioritization
 

Reflect and friends - Tools and resources for mining biomedical text