3. Kamus, Tesaurus
English
Manually developed
Hubungan antara kata
synonym
hyponym, hypernym
meronym, holonym
antonym
WordNet – ISD312 NLTK dan Python 3
4. Menggunakan package nltk
from nltk.corpus import wordnet as wn
Mengakses synonym set (synset) sebuah kata
wn.synsets('motorcar')
wn.synset('car.n.01').lemma_names
wn.synset('car.n.01').lemmas
wn.synset('car.n.01').definition
wn.synset('car.n.01').examples
WordNet – ISD312 NLTK dan Python 4
5. >>> from nltk.corpus import wordnet as wn
>>> wn.synsets('motorcar')
[Synset('car.n.01')]
'motorcar' adalah anggota himpunan sinonim 'car.n.01'
Anggota lain dari himpunan sinonim 'car.n.01'
>>> wn.synset('car.n.01').lemma_names
['car', 'auto', 'automobile', 'machine', 'motorcar']
WordNet – ISD312 NLTK dan Python 5
6. Lemma: nama synset dan kata
synset: car.n.01
kata: car
lemma: car.n.01.car
Mendapatkan semua lemma dari himpunan sinonim
'car.n.01':
>>> wn.synset('car.n.01').lemmas
[Lemma('car.n.01.car'), Lemma('car.n.01.auto'),
Lemma('car.n.01.automobile'),
Lemma('car.n.01.machine'), Lemma('car.n.01.motorcar')]
WordNet – ISD312 NLTK dan Python 6
7. >>> wn.synset('car.n.01').definition
'a motor vehicle with four wheels; usually propelled by
an internal combustion engine'
>>> wn.synset('car.n.01').examples
['he needs a car to get to work']
atau
>>> wn.lemmas('car')
[Lemma('car.n.01.car'), Lemma('car.n.02.car'),
Lemma('car.n.03.car'),
Lemma('car.n.04.car'), Lemma('cable_car.n.01.car')]
WordNet – ISD312 NLTK dan Python 7
8. Kata 'car' berada dalam synset berbeda
>>> wn.synsets('car')
[Synset('car.n.01'), Synset('car.n.02'),
Synset('car.n.03'), Synset('car.n.04'),
Synset('cable_car.n.01')]
WordNet – ISD312 NLTK dan Python 8
11. >>> sorted([lemma.name for synset in types_of_motorcar
for lemma in synset.lemmas])
['Model_T', 'S.U.V.', 'SUV', 'Stanley_Steamer',
'ambulance', 'beach_waggon',
...]
WordNet – ISD312 NLTK dan Python 11
14. Bagian dari 'tree' adalah 'trunk'
>>> wn.synset('tree.n.01').part_meronyms()
[Synset('burl.n.02'), Synset('crown.n.07'),
Synset('stump.n.01'),
Synset('trunk.n.01'), Synset('limb.n.02')]
>>> wn.synset('tree.n.01').substance_meronyms()
[Synset('heartwood.n.01'), Synset('sapwood.n.01')]
>>> wn.synset('tree.n.01').member_holonyms()
[Synset('forest.n.01')]
WordNet – ISD312 NLTK dan Python 14
15. >>> for synset in wn.synsets('mint', wn.NOUN):
... print synset.name + ':', synset.definition
...
batch.n.02: (often followed by `of') a large
number or amount or extent
mint.n.02: any north temperate plant of the genus
Mentha with aromatic leaves and
small mauve flowers
mint.n.03: any member of the mint family of plants
mint.n.04: the leaves of a mint plant used fresh
or candied
mint.n.05: a candy that is flavored with a mint
oil
mint.n.06: a plant where money is coined by
authority of the government
WordNet – ISD312 NLTK dan Python 15
19. Synsets dihubungkan oleh lexical relations
Diberikan sebuah synset, telusuri WordNet untuk
menemukan synset yang mirip secara semantic
Penting untuk membangun index
Penting untuk mengolah kueri
Kueri 'vehicle' mengambil juga dokumen tentang
'limousine'
WordNet – ISD312 NLTK dan Python 19
20. Semakin dekat path antara dua lemma, semakin mirip
makna semantik kedua lemma tersebut
>>> right = wn.synset('right_whale.n.01')
>>> orca = wn.synset('orca.n.01')
>>> minke = wn.synset('minke_whale.n.01')
>>> tortoise = wn.synset('tortoise.n.01')
>>> novel = wn.synset('novel.n.01')
WordNet – ISD312 NLTK dan Python 20
24. WordNet Bahasa Indonesia
Thesaurus Bahasa Indonesia
kateglo
Membuat WordNet secara otomatis
mengidentifikasi kemunculan bahasa gaul
'sesuatu banget', 'rempong', 'jablay', 'lebay'
VerbNet
nltk.corpus.verbnet
WordNet – ISD312 NLTK dan Python 24
25. Temukan semua senses dari kata 'dish'
menurut penguasaan bahasa Inggris anda
menurut cara yang dibahas di kelas
WordNet – ISD312 NLTK dan Python 25
26. Soal nomor 27: The polysemy of a word is the number
of senses it has. Using WordNet, we can determine
that the noun dog has seven senses with
len(wn.synsets('dog', 'n')). Compute the average
polysemy of nouns, verbs, adjectives, and adverbs
according to WordNet.
WordNet – ISD312 NLTK dan Python 26
27. http://www.nltk.org/book
KAmus, TEsaurus, dan GLOsarium,
http://bahtera.org/kateglo
http://www.sinonimkata.com/
http://tjerdastangkas.blogspot.com/search/label/isd312
WordNet – ISD312 NLTK dan Python 27
28. Lexical richness
Perbandingan jumlah tokens dengan jumlah kata unik
len(text1) / len(set(text1))
Integer division
from __future__ import division
Jumlah kemunculan sebuah token
text1.count('whale')
100 * text1.count('whale') / len(text1)
WordNet – ISD312 NLTK dan Python 28