Submit Search
Upload
Text-mining practical
•
Download as PPT, PDF
•
3 likes
•
929 views
Lars Juhl Jensen
Follow
Text-mining practical
Read less
Read more
Science
Report
Share
Report
Share
1 of 76
Download now
Recommended
Text-mining practical
Text-mining practical
Lars Juhl Jensen
Text-mining practical
Text-mining practical
Lars Juhl Jensen
Text mining exercise
Text mining exercise
Lars Juhl Jensen
Text-mining practical
Text-mining practical
Lars Juhl Jensen
CRISPR-Cas9: The new frontier of Genome Engineering
CRISPR-Cas9: The new frontier of Genome Engineering
St Xaviers
Biomedical data
Biomedical data
beiko
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Golden Helix Inc
Large-scale data and text mining
Large-scale data and text mining
Lars Juhl Jensen
Recommended
Text-mining practical
Text-mining practical
Lars Juhl Jensen
Text-mining practical
Text-mining practical
Lars Juhl Jensen
Text mining exercise
Text mining exercise
Lars Juhl Jensen
Text-mining practical
Text-mining practical
Lars Juhl Jensen
CRISPR-Cas9: The new frontier of Genome Engineering
CRISPR-Cas9: The new frontier of Genome Engineering
St Xaviers
Biomedical data
Biomedical data
beiko
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Golden Helix Inc
Large-scale data and text mining
Large-scale data and text mining
Lars Juhl Jensen
One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Illustrating the power of dictionary-based named entit...
Lars Juhl Jensen
One tagger, many uses: Simple text-mining strategies for biomedicine
One tagger, many uses: Simple text-mining strategies for biomedicine
Lars Juhl Jensen
Extract 2.0: Text-mining-assisted interactive annotation
Extract 2.0: Text-mining-assisted interactive annotation
Lars Juhl Jensen
Network visualization: A crash course on using Cytoscape
Network visualization: A crash course on using Cytoscape
Lars Juhl Jensen
STRING & STITCH: Network integration of heterogeneous data
STRING & STITCH: Network integration of heterogeneous data
Lars Juhl Jensen
Biomedical text mining: Automatic processing of unstructured text
Biomedical text mining: Automatic processing of unstructured text
Lars Juhl Jensen
Medical network analysis: Linking diseases and genes through data and text mi...
Medical network analysis: Linking diseases and genes through data and text mi...
Lars Juhl Jensen
Network Biology: A crash course on STRING and Cytoscape
Network Biology: A crash course on STRING and Cytoscape
Lars Juhl Jensen
Cellular networks
Cellular networks
Lars Juhl Jensen
Cellular Network Biology: Large-scale integration of data and text
Cellular Network Biology: Large-scale integration of data and text
Lars Juhl Jensen
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Lars Juhl Jensen
STRING & related databases: Large-scale integration of heterogeneous data
STRING & related databases: Large-scale integration of heterogeneous data
Lars Juhl Jensen
Tagger: Rapid dictionary-based named entity recognition
Tagger: Rapid dictionary-based named entity recognition
Lars Juhl Jensen
Network Biology: Large-scale integration of data and text
Network Biology: Large-scale integration of data and text
Lars Juhl Jensen
Medical text mining: Linking diseases, drugs, and adverse reactions
Medical text mining: Linking diseases, drugs, and adverse reactions
Lars Juhl Jensen
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and text
Lars Juhl Jensen
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Lars Juhl Jensen
Cellular Network Biology
Cellular Network Biology
Lars Juhl Jensen
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and text
Lars Juhl Jensen
Biomarker bioinformatics: Network-based candidate prioritization
Biomarker bioinformatics: Network-based candidate prioritization
Lars Juhl Jensen
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
UmerFayaz5
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
jana861314
More Related Content
More from Lars Juhl Jensen
One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Illustrating the power of dictionary-based named entit...
Lars Juhl Jensen
One tagger, many uses: Simple text-mining strategies for biomedicine
One tagger, many uses: Simple text-mining strategies for biomedicine
Lars Juhl Jensen
Extract 2.0: Text-mining-assisted interactive annotation
Extract 2.0: Text-mining-assisted interactive annotation
Lars Juhl Jensen
Network visualization: A crash course on using Cytoscape
Network visualization: A crash course on using Cytoscape
Lars Juhl Jensen
STRING & STITCH: Network integration of heterogeneous data
STRING & STITCH: Network integration of heterogeneous data
Lars Juhl Jensen
Biomedical text mining: Automatic processing of unstructured text
Biomedical text mining: Automatic processing of unstructured text
Lars Juhl Jensen
Medical network analysis: Linking diseases and genes through data and text mi...
Medical network analysis: Linking diseases and genes through data and text mi...
Lars Juhl Jensen
Network Biology: A crash course on STRING and Cytoscape
Network Biology: A crash course on STRING and Cytoscape
Lars Juhl Jensen
Cellular networks
Cellular networks
Lars Juhl Jensen
Cellular Network Biology: Large-scale integration of data and text
Cellular Network Biology: Large-scale integration of data and text
Lars Juhl Jensen
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Lars Juhl Jensen
STRING & related databases: Large-scale integration of heterogeneous data
STRING & related databases: Large-scale integration of heterogeneous data
Lars Juhl Jensen
Tagger: Rapid dictionary-based named entity recognition
Tagger: Rapid dictionary-based named entity recognition
Lars Juhl Jensen
Network Biology: Large-scale integration of data and text
Network Biology: Large-scale integration of data and text
Lars Juhl Jensen
Medical text mining: Linking diseases, drugs, and adverse reactions
Medical text mining: Linking diseases, drugs, and adverse reactions
Lars Juhl Jensen
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and text
Lars Juhl Jensen
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Lars Juhl Jensen
Cellular Network Biology
Cellular Network Biology
Lars Juhl Jensen
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and text
Lars Juhl Jensen
Biomarker bioinformatics: Network-based candidate prioritization
Biomarker bioinformatics: Network-based candidate prioritization
Lars Juhl Jensen
More from Lars Juhl Jensen
(20)
One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Simple text-mining strategies for biomedicine
One tagger, many uses: Simple text-mining strategies for biomedicine
Extract 2.0: Text-mining-assisted interactive annotation
Extract 2.0: Text-mining-assisted interactive annotation
Network visualization: A crash course on using Cytoscape
Network visualization: A crash course on using Cytoscape
STRING & STITCH: Network integration of heterogeneous data
STRING & STITCH: Network integration of heterogeneous data
Biomedical text mining: Automatic processing of unstructured text
Biomedical text mining: Automatic processing of unstructured text
Medical network analysis: Linking diseases and genes through data and text mi...
Medical network analysis: Linking diseases and genes through data and text mi...
Network Biology: A crash course on STRING and Cytoscape
Network Biology: A crash course on STRING and Cytoscape
Cellular networks
Cellular networks
Cellular Network Biology: Large-scale integration of data and text
Cellular Network Biology: Large-scale integration of data and text
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
STRING & related databases: Large-scale integration of heterogeneous data
STRING & related databases: Large-scale integration of heterogeneous data
Tagger: Rapid dictionary-based named entity recognition
Tagger: Rapid dictionary-based named entity recognition
Network Biology: Large-scale integration of data and text
Network Biology: Large-scale integration of data and text
Medical text mining: Linking diseases, drugs, and adverse reactions
Medical text mining: Linking diseases, drugs, and adverse reactions
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and text
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Cellular Network Biology
Cellular Network Biology
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and text
Biomarker bioinformatics: Network-based candidate prioritization
Biomarker bioinformatics: Network-based candidate prioritization
Recently uploaded
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
UmerFayaz5
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
jana861314
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
Areesha Ahmad
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
PraveenaKalaiselvan1
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
Sapana Sha
The Philosophy of Science
The Philosophy of Science
University of Hertfordshire
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
Sérgio Sacani
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Sheetal Arora
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
RohitNehra6
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
PRINCE C P
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
jana861314
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
Areesha Ahmad
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
Areesha Ahmad
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
anandsmhk
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
muntazimhurra
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
Sérgio Sacani
Recently uploaded
(20)
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
The Philosophy of Science
The Philosophy of Science
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
Text-mining practical
1.
Text-mining practical Lars Juhl
Jensen
2.
unix primer
3.
the command line
4.
some useful commands
5.
cat
6.
less
7.
head -10
8.
tail -10
9.
grep ‘needle’
10.
cut -f 2
11.
sort
12.
sort -nr
13.
uniq -c
14.
redirecting output
15.
write to file
16.
command > filename
17.
using pipes
18.
command1 | command2
19.
putting it all
together
20.
cut -f 4
infile | sort | uniq -c | sort -nr | head -100 > outfile
21.
the task
22.
disease gene finding
23.
named entity recognition
24.
human genes
25.
gene prioritization
26.
what I have
done
27.
information retrieval
28.
two diseases
29.
prostate cancer
30.
schizophrenia
31.
two sets of
documents
32.
62,755 abstracts
33.
65,588 abstracts
34.
one directory with
each set
35.
one file with
each abstract
36.
dictionary
37.
tab-delimited file
38.
human genes
39.
22,523 entities
40.
synonyms
41.
from many databases
42.
orthographic variation
43.
prefixes and suffixes
44.
automatically generated
45.
2,726,495 names
46.
tagdir program
47.
flexible matching
48.
upper- and lower-case
49.
spaces and hyphens
50.
tab-delimited output
51.
what you will
do
52.
named entity recognition
53.
find unfortunate names
54.
create “black list”
55.
information extraction
56.
co-mentioning
57.
within abstracts
58.
ank genes for
each disease
59.
find shared gene
60.
61.
a helping hand
62.
“black list”
63.
100+ matches
64.
10+ matches
65.
66.
wrap up
67.
Protein kinase B
68.
PKB
69.
Akt
70.
AKT1
71.
same protein
72.
synonyms matter
73.
“black list” is
crucial
74.
text mining is
useful
75.
not black magic
76.
Thanks for your
attention! 76
Download now