SlideShare una empresa de Scribd logo
1 de 13
Descargar para leer sin conexión
International Journal of Instrumentation and Control Systems (IJICS) Vol.3, No.4, October 2013

AN APPROACH TO SPEED-UP THE WORD SENSE
DISAMBIGUATION PROCEDURE THROUGH SENSE
FILTERING
Alok Ranjan Pal,1 Anupam Munshi1 and Diganta Saha2
1

Dept. of Computer Science and Engineering College of Engineering and Management,
Kolaghat,West Bengal, India.
2

Dept. of Computer Science and Engineering , Jadavpur University,Kolkata, India

Abstract
In this paper, we are going to focus on speed up of the Word Sense Disambiguation procedure by filtering
the relevant senses of an ambiguous word through Part-of-Speech Tagging. First, this proposed approach
performs the Part-of-Speech Tagging operation before the disambiguation procedure using Bigram
approximation. As a result, the exact Part-of-Speech of the ambiguous word at a particular text instance is
derived. In the next stage, only those dictionary definitions (glosses) are retrieved from an online
dictionary, which are associated with that particular Part-of-Speech to disambiguate the exact sense of the
ambiguous word.
In the training phase, we have used Brown Corpus for Part-of-Speech Tagging and WordNet as an online
dictionary. The proposed approach reduces the execution time upto half (approximately) of the normal
execution time for a text, containing around 200 sentences. Not only that, we have found several instances,
where the correct sense of an ambiguous word is found for using the Part-of-Speech Tagging before the
Disambiguation procedure.

Key words
Word Sense Disambiguation (WSD), Part-of-Speech Tagging (POS), WordNet, Lesk Algorithm, Brown
Corpus.

1.

INTRODUCTION

In human languages all over the world, there are a lot of words having different meanings
depending on the contexts. Word Sense Disambiguation (WSD) [1-8] is the process for
identification of actual meaning of an ambiguous word based on distinct situations. As for
example, the word “Bank” has several meanings, such as “place for monitory transaction”,
“reservoir”, “turning point of a river”, and so on. Such words with multiple meanings are
ambiguous in nature. The process to decide the appropriate meaning of an ambiguous word for a
particular context is known as Word Sense Disambiguation. People have inborn ability to sense
DOI : 10.5121/ijics.2013.3403

29
International Journal of Instrumentation and Control Systems (IJICS) Vol.3, No.4, October 2013

the actual meaning of an ambiguous word in a particular context. But the machines do this job by
some pre-defined rules or statistical methods.
Two types of learning procedures are commonly used for Word Sense Disambiguation procedure.
First, Supervised Learning, where a learning set is considered for the system to predict the actual
meaning of an ambiguous word using a few sentences, having a specific meaning for that
particular word. A system finds the actual meaning of an ambiguous word for a particular context
based on that defined learning set. In this method, learning set is created manually. As a result, it
is unable to generate fixed rules for all the systems. Therefore, the actual meaning of an
ambiguous word in a given context can’t be detected always. Supervised learning derives
partially correct result, if the learning set does not contain sufficient information for all possible
senses of the ambiguous word. Even, it fails to show the result, if there is no information in the
predefined database.
In Unsupervised learning, an online dictionary is taken as learning set avoiding the inefficiency of
Supervised learning. “WordNet”[9-15] is the most widely used online dictionary maintaining
“words and related meanings” as well as “relations among different words”.
But in Unsupervised Learning procedures, commonly used for Sense Disambiguation, all the
dictionary definitions (glosses) of the ambiguous word are considered from the Dictionary. These
glosses are of different types of Part-of-Speech (POS), such as noun, verb, adjective and adverb.
In case of commonly used Unsupervised Learning procedures like Lesk Algorithm [16, 17], all
the glosses of different Part-of-Speech are considered, which takes some unnecessary additional
execution time. As an ambiguous word carries a specific Part-of-Speech in a particular context,
we have gone through Part-of-Speech Tagging [18-31] before the WSD procedure. As a result,
only the glosses of the related Part-of-Speech are considered.
Using this approach, we have observed two types of betterment in the output. First, the execution
of the disambiguation procedure becomes faster and second, as the relevant glosses are filtered,
accuracy in disambiguated sense is increased.
Organization of rest of the paper is as follows: Section 2 is about the Theoretical Background of
the proposed approach; Section 3 describes the Implementation Background; Section 4 describes
the Proposed Approach in detail; Section 5 depicts the experimental results along with
comparison; Section 6 represents Conclusion of the paper.

2.

THEORETICAL BACKGROUND

The most common Unsupervised WSD algorithm is Lesk Algorithm, which uses WordNet as an
online dictionary. The algorithm is described below in brief:

2.1

Preliminaries of Lesk Algorithm

Typical Lesk Algorithm selects a short phrase from the sentence containing an ambiguous word.
Then, dictionary definition (gloss) of each of the senses of the ambiguous word is compared with
glosses of the other words in that particular phrase. An ambiguous word is being assigned with
30
International Journal of Instrumentation and Control Systems (IJICS) Vol.3, No.4, October 2013

the particular sense, whose gloss has the highest number of overlaps (number of common words)
with the glosses of the other words of the phrase.
Example 1: “Ram and Sita everyday go to bank for withdrawal of money.”
Here, the phrase is taken depending on window size (number of consecutive words). If window
size is 3, then the phrase would be “go bank withdrawal”. All other words are being discarded as
“stop words”.
Consider the glosses of all words presented in that particular phrase are as follows:
Suppose, the number of senses of “Bank” is 2 such as ‘X’ and ‘Y’ (refer Table 1).
The number of senses of “Go” is 2 such as ‘A’ and ‘B’ (refer Table 2).
And the number of senses of “Withdrawal” is 2 such as ‘M’ and ‘N’ (refer Table 3).
Keyword Probable sense
X
Bank
Y

Table 1. Probable Senses of “Bank”.

Word

Probable sense
A

Go
B

Table 2. Probable Senses of “Go”.

Word

Probable sense
M

Withdrawal
N

Table 3. Probable Senses of “Withdrawal”.

Consider the word “Bank” as a keyword. Number of common words is measured in between a
pair of sentences.

31
International Journal of Instrumentation and Control Systems (IJICS) Vol.3, No.4, October 2013

Pair of Sentences Common number of Words
X and A

A’

X and B

B’

Y and A

A’’

Y and B

B’’

X and M

M’

X and N

N’

Y and M

M’’

Y and N

N’’

Table 4. Comparison chart between pair of sentences and common number of words within a particular
pair.

Table 4 shows all possibilities using sentences from Table 1, Table 2, Table 3, and number of
words common in each possible pair.
Finally, two senses of the keyword “Bank” have their counter readings (refer Table 4) as follows:
X counter, XC = A’ + B’ + M’ + N’.
Y counter, YC = A” + B” + M” + N”.
Therefore, higher counter value would be assigned as the sense of the keyword “Bank” in the
particular sentence. This strategy believes that surrounding words have the same senses as of the
keyword.

2.2

Simple (unsmoothed) N-gram and Bigram Model

N-gram [32] is used to compute the probability of a complete string of words (which can be
represented either as w1….wn or w1n). If each word, occurring in its correct location, is considered
as an independent event, the probability is represented as:
P (W1, W2,……,Wn-1, Wn).
The chain rule to decompose the probability would be:
32
International Journal of Instrumentation and Control Systems (IJICS) Vol.3, No.4, October 2013

P (W1n)=P(W1)P(W2|W1)P(W3|W12)….P(Wn|W1n-1).
But, computing the probability like P(Wn|W1n-1) is not easy for a long sequence of preceding
words. The Bigram model approximates the probability of a word with respect to all the previous
words P(Wn|W1n-1) by the conditional probability of the just preceding word P(Wn | Wn-1).
For example, instead of computing the probability P(rabbit | Just the other day I saw a), the
probability is approximated by P(rabbit | a).

3.

IMPLEMENTATION BACKGROUND

This paper adopts the basic ideas from typical Lesk algorithm by introducing some modifications.

3.1

Simplified Lesk Approach

In this approach, the glosses of only the keyword are considered for a specific sentence instead of
all words. Number of common words is being calculated between the specific sentence and each
dictionary definition of the particular keyword.
• Consider, earlier mentioned sentence of “Example 1” as follows: “Ram and Sita everyday
go to bank for withdrawal of money.”
• The instance sentence would be “Ram Sita everyday go bank withdrawal money” after
discarding the “stop words” like “to”, “for” and so on.
• If “Bank” is considered as the keyword and its two senses are X and Y (refer Table 1).
Then, number of common words should be calculated between the instance sentence and each
probable senses of “Bank” (refer Table 1).
• Number of common words found would be assigned to the counter of that sense of
“Bank”. Consider X-counter has the value I’ and Y-counter has the value I”.
• Finally, the higher counter value would be assigned as the sense of the keyword for the
particular instance sentence.
• The dictionary definition (gloss) of the keyword would be taken from “WordNet”.
• This approach also believes that entire sentence represents a particular sense of the
keyword.

4.

PROPOSED APPROACH

The proposed approach derives the actual sense of an ambiguous word in two steps. First, the
input text is passed through the POS Tagging module, where the POS of the ambiguous word is
derived. Second, the input sentence, containing the ambiguous word with derived POS is passed
to WSD module, where the disambiguation operation is performed using Simplified Lesk
Algorithm.
As the POS of the ambiguous word is derived before WSD operation, the selected dictionary
definitions (glosses) are filtered from all the instances present in WordNet (as Noun, Verb,
Adjective and Adverb instances). As a result, the disambiguation procedure becomes faster.
33
International Journal of Instrumentation and Control Systems (IJICS) Vol.3, No.4, October 2013

Not only that, as the POS of the ambiguous word is derived before the WSD operation, the
disambiguation algorithm is applied on only the relevant glosses. As a result, the accuracy of the
disambiguation algorithm is increased. The detail explanation of the proposed approach is given
below:
Input text
Module 1: POS Tagging using
Brown Corpus
POS of the ambiguous word for the
given context is derived.
Module 2: Simplified Lesk Algorithm is
applied to find the actual sense of the
ambiguous word, taking the derived
POS into account.

Output: Disambiguated sense of the
ambiguous word.

Figure 1. Modular representation of the overall approach

Algorithm 1: This algorithm (refer Figure 1) describes the overall approach. The first module is
responsible for POS Tagging and the second module is responsible for WSD task.
Input: Input text, containing the ambiguous word.
Output: Disambiguated sense of the ambiguous word.
Step 1: Input text, containing the ambiguous word is passed to Module 1 for finding the POS of
the ambiguous word.
Step 2: Simplified Lesk Algorithm is applied to find the actual sense of the ambiguous word,
taking the derived POS into account.
Step 3: Stop.

34
International Journal of Instrumentation and Control Systems (IJICS) Vol.3, No.4, October 2013
Module 1: POS Tagging using
Brown Corpus.
Sentence, containing the
ambiguous word is taken.
Bigram approximation of the ambiguous
word is found using the XML data source
of the Brown Corpus.

POS of the ambiguous
word is derived.

Figure 2. Implementation detail of Module 1 for POS Tagging

Module 1: Algorithm 2:
This algorithm (refer Figure 2) finds the POS of the ambiguous word using Brown Corpus. The
maximum Time Complexity of the algorithm is O(n2), which is evaluated at step 2.
Input: Sentence, containing the ambiguous word.
Output: POS of the ambiguous word.
Step 1: Input sentence, containing the ambiguous word is taken.
Step 2: Bigram approximation of the ambiguous word is found using the XML data source of the
Brown Corpus.
Step 3: POS of the ambiguous word is derived from the largest approximation value.
Step 4: Stop.

35
International Journal of Instrumentation and Control Systems (IJICS) Vol.3, No.4, October 2013
Module 2: WSD using Simplified
Lesk Algorithm

The ambiguous word is taken.

Only those dictionary definitions
(glosses) are considered for WSD,
which belong to the same POS
domain w. r. t. to the POS of the
ambiguous word.
Overlaps are encountered between the
glosses and the input sentence.

Maximum number of overlaps for an
instance represents the disambiguated
sense of the ambiguous word

Derived sense of the ambiguous word is
represented as output

Figure 3. Implementation detail of Module 2 for WSD procedure

Module 2: Algorithm3: This algorithm (refer Figure 3) derives the actual sense of an ambiguous
word using the Simplified Lesk Algorithm. Time Complexity of the algorithm is O(n3), as finding
the total number of overlaps between a particular gloss and the input sentence is of O(n2)
complexity and this procedure is performed for all the n number of glosses.
Input: Ambiguous word with derived POS.
Output: Disambiguated sense of the ambiguous word.
Step 1: The ambiguous word is taken.
Step 2: Only those dictionary definitions (glosses) are considered from WordNet, which belong to
the same POS domain w. r. t. to the POS of the ambiguous word.
Step 3: Overlaps are encountered between the glosses and the input sentence itself.
Step 4: The actual sense of the ambiguous word is derived from the maximum number of
overlaps for an instance.
Step 5: Stop.

36
International Journal of Instrumentation and Control Systems (IJICS) Vol.3, No.4, October 2013

The proposed approach gives better result regarding the execution time and the accuracy of the
result, which is described in the next section.

5.

OUTPUT AND DISCUSSION

The algorithm is tested on more than 100 texts of different lengths and categories. Average length
of the texts is of 200 sentences and two ambiguous words are selected for testing, "Bank" and
"Plant".
Next, the Simplified Lesk Algorithm is applied on the input text, containing the POS-tagged
ambiguous word. As the POS of the ambiguous word is derived earlier, only those dictionary
definitions are selected from WordNet for WSD process, which belong to the same POS domain
w. r. t. the POS of the ambiguous word. As a result, the execution time of the WSD process
becomes less (refer Table 5).
It is also observed that, as the relevant glosses are considered for the WSD process, accuracy of
the disambiguated sense is increased (refer Text no. 10).
Some of the results for target word "Bank" are given in Table 5. All the sample texts are taken
from "www.wikipedia.com".

Table 5. Speed up analysis of WSD procedure for target word "Bank".

37
International Journal of Instrumentation and Control Systems (IJICS) Vol.3, No.4, October 2013

Note 1: D-sense means Disambiguated sense, E-time means Execution time, ms means Mille
Second.
In the following sample test (Text no. 10), it is depicted that, in the given input text the
ambiguous word "Plant" carries the actual sense (decided by human) as "Living Organism",
which is in noun sense, but when the algorithm runs without POS Tagging, it derives the sense
"Contact", which is in verb sense and obviously it is a wrong sense for this context according to
human decision. This text is also taken from "www.wikipedia.com".
Accuracy measurement of WSD procedure using the proposed approach is described below with
a sample text.
Text no. 10:
Plants, also called green plants, are living organisms of the kingdom Plantae including such multi
cellular groups as flowering plants, conifers, ferns and mosses, as well as, depending on
definition, the green algae, but not red or brown seaweeds like kelp, nor fungi or bacteria.
Green plants have cell walls with cellulose and characteristically obtain most of their energy from
sunlight via photosynthesis using chlorophyll contained in chloroplasts, which gives them their
green color. Some plants are parasitic and may not produce normal amounts of chlorophyll or
photosynthesize. Plants are also characterized by sexual reproduction, modular and indeterminate
growth, and an alternation of generations, although asexual reproduction is common, and some
plants bloom only once while others bears only one bloom.
Precise numbers are difficult to determine, but as of 2010, there are thought to be 300–315
thousand species of plants, of which the great majority, some 260–290 thousand, are seed plants.
Green plants provide most of the world's molecular oxygen and are the basis of most of the earth's
ecologies, especially on land. Plants described as grains, fruits and vegetables form mankind's
basic foodstuffs, and have been domesticated for millennia. Plants enrich our lives as flowers and
ornaments. Until recently and in great variety they have served as the source of most of our
medicines and drugs. Their scientific study is known as botany.
Output:
Target word: Plant.
Actual sense: Living Organism (Noun).
Derived sense with POS Tagging: Living Organism (Noun).
Derived sense without POS Tagging: Contact (Verb).

6.

CONCLUSION AND FUTURE WORK

The proposed approach speeds up the WSD procedure by filtering the only relevant glosses and
increases the accuracy of the WSD procedure as well. The execution time differences between the
two cases (with and without POS Tagging procedure) might be increased, if few system calls and
other system related tasks are handled properly. The obvious operations (loop, memory
allocation, condition check, function call etc.) for POS Tagging took some time, which is
included in the cited result. Otherwise, the actual time difference could have been better.
38
International Journal of Instrumentation and Control Systems (IJICS) Vol.3, No.4, October 2013

REFERENCES
[1]
[2]

[3]

[4]
[5]
[6]

[7]

[8]
[9]
[10]
[11]

[12]

[13]
[14]
[15]
[16]

[17]
[18]

[19]

R. S. Cucerzan, C. Schafer, D. Yarowsky, “Combining classifiers for word sense disambiguation,
Natural Language Engineering,” Vol. 8, No. 4, 2002, Cambridge University Press, pp. 327-341.
M. S. Nameh , M. Fakhrahmad, M. Z. Jahromi, “A New Approach to Word Sense
Disambiguation Based on Context Similarity,” Proceedings of the World Congress on
Engineering, Vol. I, 2011.
Gaizauskas, “Gold Standard Datasets for Evaluating Word Sense Disambiguation Programs,”
Computer Speech and Language, Vol. 12, No. 3, Special Issue on Evaluation of Speech and
Language Technology, pp. 453-472.
R. Navigli, “Word Sense Disambiguation: a Survey,” ACM Computing Surveys, Vol. 41, No. 2,
2009, ACM Press, pp. 1-69
N. Ide , J. Véronis, “ Word Sense Disambiguation: The State of the Art,” Computational
Linguistics, Vol. 24, No. 1, 1998, pp. 1-40.
W. Xiaojie, Y. Matsumoto, “Chinese word sense disambiguation by combining pseudo training
data,” Proceedings of The International Conference on Natural Language Processing and
Knowledge Engineering, 2003, pp. 138-143.
C. Santamaria, J Gonzalo, F. Verdejo,”Automatic Association of WWW Directories to Word
Senses,” Computational Linguistics, Vol. 3, Issue 3, Special Issue on the Web as Corpus, 2003,
pp. 485-502.
J. Heflin, J. Hendler, “A Portrait of the Semantic Web in Action,” IEEE Intelligent Systems, Vol.
16, No. 2, 2001, pp. 54-59.
S. G. Kolte, S. G. Bhirud,” Word Sense Disambiguation Using WordNet Domains,” First
International Conference on Digital Object Identifier, 2008, pp. 1187-1191.
G. Miller: WordNet: An on-line lexical database, International Journal of Lexicography, Vol. 3,
No. 4, 1991
Y. Liu, P. Scheuermann, X. Li, X. Zhu, “Using WordNet to Disambiguate Word Senses for Text
Classification,” Proceedings of the 7th International Conference on Computational Science,
Springer-Verlag, 2007, pp. 781 – 789.
A. J. Cañas, A. Valerio, J. Lalinde-Pulido, M. Carvalho, M. Arguedas,”Using WordNet for Word
Sense Disambiguation to Support Concept Map Construction,” String Processing and Information
Retrieval, 2003, pp. 350-359.
G. A. Miller, “WordNet: A Lexical Database,” Comm. ACM, Vol. 38, No. 11, 1993, pp. 39-41.
H. Seo, H. Chung, H. Rim, H. Myaeng, S. Kim, “Unsupervised word sense disambiguation using
WordNet relatives,” Computer Speech and Language, Vol. 18, No. 3, 2004, pp. 253-273.
G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross, K. J. Miller, “WordNet An on-line lexical
database,” International Journal of Lexicography, Vol. 3, No. 4, 1990, pp. 235-244.
S. Banerjee, T. Pedersen, “An adapted Lesk algorithm for word sense disambiguation using
WordNet,” In Proceedings of the Third International Conference on Intelligent Text Processing
and Computational Linguistics, Mexico City, February, 2002.
M. Lesk, “Automatic Sense Disambiguation Using Machine Readable Dictionaries: How to Tell a
Pine Cone from an Ice Cream Cone,” Proceedings of SIGDOC, 1986.
S. Dandapat, “Part Of Specch Tagging and Chunking with Maximum Entropy Model,” in
Proceedings of the IJCAI Workshop on Shallow Parsing for South Asian Languages, (Hyderabad,
India), pp. 29–32, 2007.
A. Ekbal, R. Haque, and S. Bandyopadhyay, ”Maximum Entropy based Bengali Part of Speech
Tagging,” in A. Gelbukh (Ed.), Advances in Natural Language Processing and Applications,
Research in Computing Science (RCS) Journal, vol. 33, pp. 67–78.

39
International Journal of Instrumentation and Control Systems (IJICS) Vol.3, No.4, October 2013
[20]

[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]

[30]

[31]

[32]

A. Ekbal, R. Haque, and S. Bandyopadhyay, “Bengali Part of Speech Tagging using Conditional
Random Field,” in Proceedings of the seventh International Symposium on Natural Language
Processing, SNLP-2007, 2007.
A. Ratnaparkhi, “A maximum entropy part-of -speech tagger,” in Proc. of EMNLP’96., 1996.
PVS. Avinesh, G. Karthik, “Part Of Speech Tagging and Chunking using Conditional Random
Fields and Transformation Based Learning,” In Proc. of SPSAL 2007, IJCAI, India, 21-24.
T. Brants, “TnT-A Statistical Part of Speech Tagger,” In Proc. of the 6th ANLP Conference, 224231.
D. Cutting, J. Kupiec, J. Pederson and P. Sibun,”A Practical Part of Speech Tagger,” In Proc. of
the 3rd ANLP Conference, 133-140, 1992.
A. Ekbal, and S. Bandyopadhyay, “Lexicon Development and POS tagging using a Tagged
Bengali News Corpus. In Proc. of FLAIRS-2007, Florida, 261-263.
M. Mcteer, R. Schwartz and R. Weischedel, “Empirical studies in part-of-speech labeling.
Proceedings Of the 4th DARPA Workshop on Speech and Natural Language,” pp. 331-336, 1991.
B. Merialdo, “Tagging English text with a probabilistic model,” Computational Linguistics,
20(2):155-171, 1994.
M. J. Beal, Z. Ghahramani, and C. E. Rasmussen, “The infinite hidden markov model,” Advances
in Neural Information Processing Systems, 14:577 –584, 2002.
Chris Biemann, Claudio Giuliano, and Alfio Gliozzo,”Unsupervised part-of-speech tagging
supporting supervised methods,” In Proceedings of RANLP. Eric Brill. 1994. Some advances in
transformationbased part of speech tagging. In National Conference on Artificial Intelligence,
pages 722–727, 2007.
J. Van Gael, Y. Saatci, Y. W. Teh, and Z. Ghahramani, “Beam sampling for the infinite hidden
markov model,” In Proceedings of the 25th international conference on Machine learning, volume
25, Helsinki, 2007.
Jianfeng Gao and Mark Johnson, “A comparison of bayesian estimators for unsupervised hidden
markov model pos taggers,” In Proceedings of the 2008 Conference on Empirical Methods in
Natural Language Processing, pages 344–352.
Daniel Jurafsky and James H. Martin. Speech and Language Processing, Prentice Hall, 2000.

40
International Journal of Instrumentation and Control Systems (IJICS) Vol.3, No.4, October 2013
Authors

Alok Ranjan Pal has been working as an a Assistant Professor in Computer Science and
Engineering Department of College of Engineering and Management, Kolaghat since
2006. He has completed his Bachelor's and Master's degree under WBUT. Now, he is
working on Natural Language Processing

Mr. Anupam Munshi is a student of Information Technology Department of College of
Engineering and Management, Kolaghat. His field of interest is AI, Soft Computing and
NLP.

Dr. Diganta Saha is an Associate Professor in Department of Computer Science &
Engineering, Jadavpur University. His field of specialization is Machine Translation/
Natural Language Processing/ Mobile Computing/ Pattern Classification.

41

Más contenido relacionado

La actualidad más candente

ANALYSIS OF MWES IN HINDI TEXT USING NLTK
ANALYSIS OF MWES IN HINDI TEXT USING NLTKANALYSIS OF MWES IN HINDI TEXT USING NLTK
ANALYSIS OF MWES IN HINDI TEXT USING NLTKijnlc
 
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURESGENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURESijnlc
 
Towards Building Parallel Dependency Treebanks: Intra-Chunk Expansion and Ali...
Towards Building Parallel Dependency Treebanks: Intra-Chunk Expansion and Ali...Towards Building Parallel Dependency Treebanks: Intra-Chunk Expansion and Ali...
Towards Building Parallel Dependency Treebanks: Intra-Chunk Expansion and Ali...Algoscale Technologies Inc.
 
Using ontology based context in the
Using ontology based context in theUsing ontology based context in the
Using ontology based context in theijaia
 
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Quinsulon Israel
 
STOCKGRAM : DEEP LEARNING MODEL FOR DIGITIZING FINANCIAL COMMUNICATIONS VIA N...
STOCKGRAM : DEEP LEARNING MODEL FOR DIGITIZING FINANCIAL COMMUNICATIONS VIA N...STOCKGRAM : DEEP LEARNING MODEL FOR DIGITIZING FINANCIAL COMMUNICATIONS VIA N...
STOCKGRAM : DEEP LEARNING MODEL FOR DIGITIZING FINANCIAL COMMUNICATIONS VIA N...ijnlc
 
STOCKGRAM : DEEP LEARNING MODEL FOR DIGITIZING FINANCIAL COMMUNICATIONS VIA N...
STOCKGRAM : DEEP LEARNING MODEL FOR DIGITIZING FINANCIAL COMMUNICATIONS VIA N...STOCKGRAM : DEEP LEARNING MODEL FOR DIGITIZING FINANCIAL COMMUNICATIONS VIA N...
STOCKGRAM : DEEP LEARNING MODEL FOR DIGITIZING FINANCIAL COMMUNICATIONS VIA N...kevig
 
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATIONSEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATIONijnlc
 
Cross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristicsCross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristicsijaia
 
Machine translation with statistical approach
Machine translation with statistical approachMachine translation with statistical approach
Machine translation with statistical approachvini89
 
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...acijjournal
 
Sentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic RelationsSentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic RelationsEditor IJCATR
 
Extraction Based automatic summarization
Extraction Based automatic summarizationExtraction Based automatic summarization
Extraction Based automatic summarizationAbdelaziz Al-Rihawi
 
Ijarcet vol-3-issue-3-623-625 (1)
Ijarcet vol-3-issue-3-623-625 (1)Ijarcet vol-3-issue-3-623-625 (1)
Ijarcet vol-3-issue-3-623-625 (1)Dhabal Sethi
 
7 probability and statistics an introduction
7 probability and statistics an introduction7 probability and statistics an introduction
7 probability and statistics an introductionThennarasuSakkan
 
Neural Network in Knowledge Bases
Neural Network in Knowledge BasesNeural Network in Knowledge Bases
Neural Network in Knowledge BasesKushal Arora
 
PRONOUN DISAMBIGUATION: WITH APPLICATION TO THE WINOGRAD SCHEMA CHALLENGE
PRONOUN DISAMBIGUATION: WITH APPLICATION TO THE WINOGRAD SCHEMA CHALLENGEPRONOUN DISAMBIGUATION: WITH APPLICATION TO THE WINOGRAD SCHEMA CHALLENGE
PRONOUN DISAMBIGUATION: WITH APPLICATION TO THE WINOGRAD SCHEMA CHALLENGEkevig
 

La actualidad más candente (20)

ANALYSIS OF MWES IN HINDI TEXT USING NLTK
ANALYSIS OF MWES IN HINDI TEXT USING NLTKANALYSIS OF MWES IN HINDI TEXT USING NLTK
ANALYSIS OF MWES IN HINDI TEXT USING NLTK
 
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURESGENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
 
Towards Building Parallel Dependency Treebanks: Intra-Chunk Expansion and Ali...
Towards Building Parallel Dependency Treebanks: Intra-Chunk Expansion and Ali...Towards Building Parallel Dependency Treebanks: Intra-Chunk Expansion and Ali...
Towards Building Parallel Dependency Treebanks: Intra-Chunk Expansion and Ali...
 
Using ontology based context in the
Using ontology based context in theUsing ontology based context in the
Using ontology based context in the
 
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
 
Document similarity
Document similarityDocument similarity
Document similarity
 
STOCKGRAM : DEEP LEARNING MODEL FOR DIGITIZING FINANCIAL COMMUNICATIONS VIA N...
STOCKGRAM : DEEP LEARNING MODEL FOR DIGITIZING FINANCIAL COMMUNICATIONS VIA N...STOCKGRAM : DEEP LEARNING MODEL FOR DIGITIZING FINANCIAL COMMUNICATIONS VIA N...
STOCKGRAM : DEEP LEARNING MODEL FOR DIGITIZING FINANCIAL COMMUNICATIONS VIA N...
 
STOCKGRAM : DEEP LEARNING MODEL FOR DIGITIZING FINANCIAL COMMUNICATIONS VIA N...
STOCKGRAM : DEEP LEARNING MODEL FOR DIGITIZING FINANCIAL COMMUNICATIONS VIA N...STOCKGRAM : DEEP LEARNING MODEL FOR DIGITIZING FINANCIAL COMMUNICATIONS VIA N...
STOCKGRAM : DEEP LEARNING MODEL FOR DIGITIZING FINANCIAL COMMUNICATIONS VIA N...
 
Text summarization
Text summarization Text summarization
Text summarization
 
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATIONSEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
 
Cross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristicsCross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristics
 
Machine translation with statistical approach
Machine translation with statistical approachMachine translation with statistical approach
Machine translation with statistical approach
 
Nlp
NlpNlp
Nlp
 
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
 
Sentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic RelationsSentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic Relations
 
Extraction Based automatic summarization
Extraction Based automatic summarizationExtraction Based automatic summarization
Extraction Based automatic summarization
 
Ijarcet vol-3-issue-3-623-625 (1)
Ijarcet vol-3-issue-3-623-625 (1)Ijarcet vol-3-issue-3-623-625 (1)
Ijarcet vol-3-issue-3-623-625 (1)
 
7 probability and statistics an introduction
7 probability and statistics an introduction7 probability and statistics an introduction
7 probability and statistics an introduction
 
Neural Network in Knowledge Bases
Neural Network in Knowledge BasesNeural Network in Knowledge Bases
Neural Network in Knowledge Bases
 
PRONOUN DISAMBIGUATION: WITH APPLICATION TO THE WINOGRAD SCHEMA CHALLENGE
PRONOUN DISAMBIGUATION: WITH APPLICATION TO THE WINOGRAD SCHEMA CHALLENGEPRONOUN DISAMBIGUATION: WITH APPLICATION TO THE WINOGRAD SCHEMA CHALLENGE
PRONOUN DISAMBIGUATION: WITH APPLICATION TO THE WINOGRAD SCHEMA CHALLENGE
 

Similar a An approach to speed up the word sense disambiguation procedure through sense filtering

AMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYAMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYijnlc
 
NBLex: emotion prediction in Kannada-English code-switchtext using naïve baye...
NBLex: emotion prediction in Kannada-English code-switchtext using naïve baye...NBLex: emotion prediction in Kannada-English code-switchtext using naïve baye...
NBLex: emotion prediction in Kannada-English code-switchtext using naïve baye...IJECEIAES
 
[Emnlp] what is glo ve part ii - towards data science
[Emnlp] what is glo ve  part ii - towards data science[Emnlp] what is glo ve  part ii - towards data science
[Emnlp] what is glo ve part ii - towards data scienceNikhil Jaiswal
 
Doc format.
Doc format.Doc format.
Doc format.butest
 
An approach to word sense disambiguation combining modified lesk and bag of w...
An approach to word sense disambiguation combining modified lesk and bag of w...An approach to word sense disambiguation combining modified lesk and bag of w...
An approach to word sense disambiguation combining modified lesk and bag of w...csandit
 
AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-W...
AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-W...AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-W...
AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-W...cscpconf
 
Analysis of Opinionated Text for Opinion Mining
Analysis of Opinionated Text for Opinion MiningAnalysis of Opinionated Text for Opinion Mining
Analysis of Opinionated Text for Opinion Miningmlaij
 
PRONOUN DISAMBIGUATION: WITH APPLICATION TO THE WINOGRAD SCHEMA CHALLENGE
PRONOUN DISAMBIGUATION: WITH APPLICATION TO THE WINOGRAD SCHEMA CHALLENGEPRONOUN DISAMBIGUATION: WITH APPLICATION TO THE WINOGRAD SCHEMA CHALLENGE
PRONOUN DISAMBIGUATION: WITH APPLICATION TO THE WINOGRAD SCHEMA CHALLENGEijnlc
 
Sentence level sentiment polarity calculation for customer reviews by conside...
Sentence level sentiment polarity calculation for customer reviews by conside...Sentence level sentiment polarity calculation for customer reviews by conside...
Sentence level sentiment polarity calculation for customer reviews by conside...eSAT Publishing House
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
An Improved Approach for Word Ambiguity Removal
An Improved Approach for Word Ambiguity RemovalAn Improved Approach for Word Ambiguity Removal
An Improved Approach for Word Ambiguity RemovalWaqas Tariq
 
Word sense disambiguation a survey
Word sense disambiguation  a surveyWord sense disambiguation  a survey
Word sense disambiguation a surveyijctcm
 
Turkish language modeling using BERT
Turkish language modeling using BERTTurkish language modeling using BERT
Turkish language modeling using BERTAbdurrahimDerric
 
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAIDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAijistjournal
 
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations onijistjournal
 
Sentence analysis
Sentence analysisSentence analysis
Sentence analysiskrukob9
 

Similar a An approach to speed up the word sense disambiguation procedure through sense filtering (20)

AMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYAMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITY
 
NBLex: emotion prediction in Kannada-English code-switchtext using naïve baye...
NBLex: emotion prediction in Kannada-English code-switchtext using naïve baye...NBLex: emotion prediction in Kannada-English code-switchtext using naïve baye...
NBLex: emotion prediction in Kannada-English code-switchtext using naïve baye...
 
[Emnlp] what is glo ve part ii - towards data science
[Emnlp] what is glo ve  part ii - towards data science[Emnlp] what is glo ve  part ii - towards data science
[Emnlp] what is glo ve part ii - towards data science
 
Doc format.
Doc format.Doc format.
Doc format.
 
An approach to word sense disambiguation combining modified lesk and bag of w...
An approach to word sense disambiguation combining modified lesk and bag of w...An approach to word sense disambiguation combining modified lesk and bag of w...
An approach to word sense disambiguation combining modified lesk and bag of w...
 
AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-W...
AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-W...AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-W...
AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-W...
 
AINL 2016: Yagunova
AINL 2016: YagunovaAINL 2016: Yagunova
AINL 2016: Yagunova
 
Analysis of Opinionated Text for Opinion Mining
Analysis of Opinionated Text for Opinion MiningAnalysis of Opinionated Text for Opinion Mining
Analysis of Opinionated Text for Opinion Mining
 
PRONOUN DISAMBIGUATION: WITH APPLICATION TO THE WINOGRAD SCHEMA CHALLENGE
PRONOUN DISAMBIGUATION: WITH APPLICATION TO THE WINOGRAD SCHEMA CHALLENGEPRONOUN DISAMBIGUATION: WITH APPLICATION TO THE WINOGRAD SCHEMA CHALLENGE
PRONOUN DISAMBIGUATION: WITH APPLICATION TO THE WINOGRAD SCHEMA CHALLENGE
 
Sentence level sentiment polarity calculation for customer reviews by conside...
Sentence level sentiment polarity calculation for customer reviews by conside...Sentence level sentiment polarity calculation for customer reviews by conside...
Sentence level sentiment polarity calculation for customer reviews by conside...
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
Ny3424442448
Ny3424442448Ny3424442448
Ny3424442448
 
An Improved Approach for Word Ambiguity Removal
An Improved Approach for Word Ambiguity RemovalAn Improved Approach for Word Ambiguity Removal
An Improved Approach for Word Ambiguity Removal
 
Word sense disambiguation a survey
Word sense disambiguation  a surveyWord sense disambiguation  a survey
Word sense disambiguation a survey
 
Turkish language modeling using BERT
Turkish language modeling using BERTTurkish language modeling using BERT
Turkish language modeling using BERT
 
Cc35451454
Cc35451454Cc35451454
Cc35451454
 
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAIDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
 
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations on
 
Sentence analysis
Sentence analysisSentence analysis
Sentence analysis
 

Más de ijics

8th International Conference of Information Technology, Control and Automatio...
8th International Conference of Information Technology, Control and Automatio...8th International Conference of Information Technology, Control and Automatio...
8th International Conference of Information Technology, Control and Automatio...ijics
 
International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)ijics
 
8th International Conference of Information Technology, Control and Automatio...
8th International Conference of Information Technology, Control and Automatio...8th International Conference of Information Technology, Control and Automatio...
8th International Conference of Information Technology, Control and Automatio...ijics
 
ANGULAR MOMENTUM SPINNER
ANGULAR MOMENTUM SPINNERANGULAR MOMENTUM SPINNER
ANGULAR MOMENTUM SPINNERijics
 
A STUDY ON APPLICATION OF BAYES’ THEOREM IN APPIN TECHNOLOGY
A STUDY ON APPLICATION OF BAYES’ THEOREM IN APPIN TECHNOLOGYA STUDY ON APPLICATION OF BAYES’ THEOREM IN APPIN TECHNOLOGY
A STUDY ON APPLICATION OF BAYES’ THEOREM IN APPIN TECHNOLOGYijics
 
8th International Conference of Information Technology, Control and Automatio...
8th International Conference of Information Technology, Control and Automatio...8th International Conference of Information Technology, Control and Automatio...
8th International Conference of Information Technology, Control and Automatio...ijics
 
International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)ijics
 
International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)ijics
 
International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)ijics
 
International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)ijics
 
International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)ijics
 
Ncwmc ppt
Ncwmc pptNcwmc ppt
Ncwmc pptijics
 
International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)ijics
 
International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)ijics
 
International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)ijics
 
International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)ijics
 
International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)ijics
 
A STUDY ON APPLICATION OF BAYES’ THEOREM IN APPIN TECHNOLOGY
A STUDY ON APPLICATION OF BAYES’ THEOREM IN APPIN TECHNOLOGYA STUDY ON APPLICATION OF BAYES’ THEOREM IN APPIN TECHNOLOGY
A STUDY ON APPLICATION OF BAYES’ THEOREM IN APPIN TECHNOLOGYijics
 
ANGULAR MOMENTUM SPINNER
ANGULAR MOMENTUM SPINNER ANGULAR MOMENTUM SPINNER
ANGULAR MOMENTUM SPINNER ijics
 
Integrating Fault Tolerant Scheme With Feedback Control Scheduling Algorithm ...
Integrating Fault Tolerant Scheme With Feedback Control Scheduling Algorithm ...Integrating Fault Tolerant Scheme With Feedback Control Scheduling Algorithm ...
Integrating Fault Tolerant Scheme With Feedback Control Scheduling Algorithm ...ijics
 

Más de ijics (20)

8th International Conference of Information Technology, Control and Automatio...
8th International Conference of Information Technology, Control and Automatio...8th International Conference of Information Technology, Control and Automatio...
8th International Conference of Information Technology, Control and Automatio...
 
International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)
 
8th International Conference of Information Technology, Control and Automatio...
8th International Conference of Information Technology, Control and Automatio...8th International Conference of Information Technology, Control and Automatio...
8th International Conference of Information Technology, Control and Automatio...
 
ANGULAR MOMENTUM SPINNER
ANGULAR MOMENTUM SPINNERANGULAR MOMENTUM SPINNER
ANGULAR MOMENTUM SPINNER
 
A STUDY ON APPLICATION OF BAYES’ THEOREM IN APPIN TECHNOLOGY
A STUDY ON APPLICATION OF BAYES’ THEOREM IN APPIN TECHNOLOGYA STUDY ON APPLICATION OF BAYES’ THEOREM IN APPIN TECHNOLOGY
A STUDY ON APPLICATION OF BAYES’ THEOREM IN APPIN TECHNOLOGY
 
8th International Conference of Information Technology, Control and Automatio...
8th International Conference of Information Technology, Control and Automatio...8th International Conference of Information Technology, Control and Automatio...
8th International Conference of Information Technology, Control and Automatio...
 
International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)
 
International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)
 
International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)
 
International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)
 
International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)
 
Ncwmc ppt
Ncwmc pptNcwmc ppt
Ncwmc ppt
 
International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)
 
International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)
 
International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)
 
International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)
 
International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)
 
A STUDY ON APPLICATION OF BAYES’ THEOREM IN APPIN TECHNOLOGY
A STUDY ON APPLICATION OF BAYES’ THEOREM IN APPIN TECHNOLOGYA STUDY ON APPLICATION OF BAYES’ THEOREM IN APPIN TECHNOLOGY
A STUDY ON APPLICATION OF BAYES’ THEOREM IN APPIN TECHNOLOGY
 
ANGULAR MOMENTUM SPINNER
ANGULAR MOMENTUM SPINNER ANGULAR MOMENTUM SPINNER
ANGULAR MOMENTUM SPINNER
 
Integrating Fault Tolerant Scheme With Feedback Control Scheduling Algorithm ...
Integrating Fault Tolerant Scheme With Feedback Control Scheduling Algorithm ...Integrating Fault Tolerant Scheme With Feedback Control Scheduling Algorithm ...
Integrating Fault Tolerant Scheme With Feedback Control Scheduling Algorithm ...
 

Último

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Último (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

An approach to speed up the word sense disambiguation procedure through sense filtering

  • 1. International Journal of Instrumentation and Control Systems (IJICS) Vol.3, No.4, October 2013 AN APPROACH TO SPEED-UP THE WORD SENSE DISAMBIGUATION PROCEDURE THROUGH SENSE FILTERING Alok Ranjan Pal,1 Anupam Munshi1 and Diganta Saha2 1 Dept. of Computer Science and Engineering College of Engineering and Management, Kolaghat,West Bengal, India. 2 Dept. of Computer Science and Engineering , Jadavpur University,Kolkata, India Abstract In this paper, we are going to focus on speed up of the Word Sense Disambiguation procedure by filtering the relevant senses of an ambiguous word through Part-of-Speech Tagging. First, this proposed approach performs the Part-of-Speech Tagging operation before the disambiguation procedure using Bigram approximation. As a result, the exact Part-of-Speech of the ambiguous word at a particular text instance is derived. In the next stage, only those dictionary definitions (glosses) are retrieved from an online dictionary, which are associated with that particular Part-of-Speech to disambiguate the exact sense of the ambiguous word. In the training phase, we have used Brown Corpus for Part-of-Speech Tagging and WordNet as an online dictionary. The proposed approach reduces the execution time upto half (approximately) of the normal execution time for a text, containing around 200 sentences. Not only that, we have found several instances, where the correct sense of an ambiguous word is found for using the Part-of-Speech Tagging before the Disambiguation procedure. Key words Word Sense Disambiguation (WSD), Part-of-Speech Tagging (POS), WordNet, Lesk Algorithm, Brown Corpus. 1. INTRODUCTION In human languages all over the world, there are a lot of words having different meanings depending on the contexts. Word Sense Disambiguation (WSD) [1-8] is the process for identification of actual meaning of an ambiguous word based on distinct situations. As for example, the word “Bank” has several meanings, such as “place for monitory transaction”, “reservoir”, “turning point of a river”, and so on. Such words with multiple meanings are ambiguous in nature. The process to decide the appropriate meaning of an ambiguous word for a particular context is known as Word Sense Disambiguation. People have inborn ability to sense DOI : 10.5121/ijics.2013.3403 29
  • 2. International Journal of Instrumentation and Control Systems (IJICS) Vol.3, No.4, October 2013 the actual meaning of an ambiguous word in a particular context. But the machines do this job by some pre-defined rules or statistical methods. Two types of learning procedures are commonly used for Word Sense Disambiguation procedure. First, Supervised Learning, where a learning set is considered for the system to predict the actual meaning of an ambiguous word using a few sentences, having a specific meaning for that particular word. A system finds the actual meaning of an ambiguous word for a particular context based on that defined learning set. In this method, learning set is created manually. As a result, it is unable to generate fixed rules for all the systems. Therefore, the actual meaning of an ambiguous word in a given context can’t be detected always. Supervised learning derives partially correct result, if the learning set does not contain sufficient information for all possible senses of the ambiguous word. Even, it fails to show the result, if there is no information in the predefined database. In Unsupervised learning, an online dictionary is taken as learning set avoiding the inefficiency of Supervised learning. “WordNet”[9-15] is the most widely used online dictionary maintaining “words and related meanings” as well as “relations among different words”. But in Unsupervised Learning procedures, commonly used for Sense Disambiguation, all the dictionary definitions (glosses) of the ambiguous word are considered from the Dictionary. These glosses are of different types of Part-of-Speech (POS), such as noun, verb, adjective and adverb. In case of commonly used Unsupervised Learning procedures like Lesk Algorithm [16, 17], all the glosses of different Part-of-Speech are considered, which takes some unnecessary additional execution time. As an ambiguous word carries a specific Part-of-Speech in a particular context, we have gone through Part-of-Speech Tagging [18-31] before the WSD procedure. As a result, only the glosses of the related Part-of-Speech are considered. Using this approach, we have observed two types of betterment in the output. First, the execution of the disambiguation procedure becomes faster and second, as the relevant glosses are filtered, accuracy in disambiguated sense is increased. Organization of rest of the paper is as follows: Section 2 is about the Theoretical Background of the proposed approach; Section 3 describes the Implementation Background; Section 4 describes the Proposed Approach in detail; Section 5 depicts the experimental results along with comparison; Section 6 represents Conclusion of the paper. 2. THEORETICAL BACKGROUND The most common Unsupervised WSD algorithm is Lesk Algorithm, which uses WordNet as an online dictionary. The algorithm is described below in brief: 2.1 Preliminaries of Lesk Algorithm Typical Lesk Algorithm selects a short phrase from the sentence containing an ambiguous word. Then, dictionary definition (gloss) of each of the senses of the ambiguous word is compared with glosses of the other words in that particular phrase. An ambiguous word is being assigned with 30
  • 3. International Journal of Instrumentation and Control Systems (IJICS) Vol.3, No.4, October 2013 the particular sense, whose gloss has the highest number of overlaps (number of common words) with the glosses of the other words of the phrase. Example 1: “Ram and Sita everyday go to bank for withdrawal of money.” Here, the phrase is taken depending on window size (number of consecutive words). If window size is 3, then the phrase would be “go bank withdrawal”. All other words are being discarded as “stop words”. Consider the glosses of all words presented in that particular phrase are as follows: Suppose, the number of senses of “Bank” is 2 such as ‘X’ and ‘Y’ (refer Table 1). The number of senses of “Go” is 2 such as ‘A’ and ‘B’ (refer Table 2). And the number of senses of “Withdrawal” is 2 such as ‘M’ and ‘N’ (refer Table 3). Keyword Probable sense X Bank Y Table 1. Probable Senses of “Bank”. Word Probable sense A Go B Table 2. Probable Senses of “Go”. Word Probable sense M Withdrawal N Table 3. Probable Senses of “Withdrawal”. Consider the word “Bank” as a keyword. Number of common words is measured in between a pair of sentences. 31
  • 4. International Journal of Instrumentation and Control Systems (IJICS) Vol.3, No.4, October 2013 Pair of Sentences Common number of Words X and A A’ X and B B’ Y and A A’’ Y and B B’’ X and M M’ X and N N’ Y and M M’’ Y and N N’’ Table 4. Comparison chart between pair of sentences and common number of words within a particular pair. Table 4 shows all possibilities using sentences from Table 1, Table 2, Table 3, and number of words common in each possible pair. Finally, two senses of the keyword “Bank” have their counter readings (refer Table 4) as follows: X counter, XC = A’ + B’ + M’ + N’. Y counter, YC = A” + B” + M” + N”. Therefore, higher counter value would be assigned as the sense of the keyword “Bank” in the particular sentence. This strategy believes that surrounding words have the same senses as of the keyword. 2.2 Simple (unsmoothed) N-gram and Bigram Model N-gram [32] is used to compute the probability of a complete string of words (which can be represented either as w1….wn or w1n). If each word, occurring in its correct location, is considered as an independent event, the probability is represented as: P (W1, W2,……,Wn-1, Wn). The chain rule to decompose the probability would be: 32
  • 5. International Journal of Instrumentation and Control Systems (IJICS) Vol.3, No.4, October 2013 P (W1n)=P(W1)P(W2|W1)P(W3|W12)….P(Wn|W1n-1). But, computing the probability like P(Wn|W1n-1) is not easy for a long sequence of preceding words. The Bigram model approximates the probability of a word with respect to all the previous words P(Wn|W1n-1) by the conditional probability of the just preceding word P(Wn | Wn-1). For example, instead of computing the probability P(rabbit | Just the other day I saw a), the probability is approximated by P(rabbit | a). 3. IMPLEMENTATION BACKGROUND This paper adopts the basic ideas from typical Lesk algorithm by introducing some modifications. 3.1 Simplified Lesk Approach In this approach, the glosses of only the keyword are considered for a specific sentence instead of all words. Number of common words is being calculated between the specific sentence and each dictionary definition of the particular keyword. • Consider, earlier mentioned sentence of “Example 1” as follows: “Ram and Sita everyday go to bank for withdrawal of money.” • The instance sentence would be “Ram Sita everyday go bank withdrawal money” after discarding the “stop words” like “to”, “for” and so on. • If “Bank” is considered as the keyword and its two senses are X and Y (refer Table 1). Then, number of common words should be calculated between the instance sentence and each probable senses of “Bank” (refer Table 1). • Number of common words found would be assigned to the counter of that sense of “Bank”. Consider X-counter has the value I’ and Y-counter has the value I”. • Finally, the higher counter value would be assigned as the sense of the keyword for the particular instance sentence. • The dictionary definition (gloss) of the keyword would be taken from “WordNet”. • This approach also believes that entire sentence represents a particular sense of the keyword. 4. PROPOSED APPROACH The proposed approach derives the actual sense of an ambiguous word in two steps. First, the input text is passed through the POS Tagging module, where the POS of the ambiguous word is derived. Second, the input sentence, containing the ambiguous word with derived POS is passed to WSD module, where the disambiguation operation is performed using Simplified Lesk Algorithm. As the POS of the ambiguous word is derived before WSD operation, the selected dictionary definitions (glosses) are filtered from all the instances present in WordNet (as Noun, Verb, Adjective and Adverb instances). As a result, the disambiguation procedure becomes faster. 33
  • 6. International Journal of Instrumentation and Control Systems (IJICS) Vol.3, No.4, October 2013 Not only that, as the POS of the ambiguous word is derived before the WSD operation, the disambiguation algorithm is applied on only the relevant glosses. As a result, the accuracy of the disambiguation algorithm is increased. The detail explanation of the proposed approach is given below: Input text Module 1: POS Tagging using Brown Corpus POS of the ambiguous word for the given context is derived. Module 2: Simplified Lesk Algorithm is applied to find the actual sense of the ambiguous word, taking the derived POS into account. Output: Disambiguated sense of the ambiguous word. Figure 1. Modular representation of the overall approach Algorithm 1: This algorithm (refer Figure 1) describes the overall approach. The first module is responsible for POS Tagging and the second module is responsible for WSD task. Input: Input text, containing the ambiguous word. Output: Disambiguated sense of the ambiguous word. Step 1: Input text, containing the ambiguous word is passed to Module 1 for finding the POS of the ambiguous word. Step 2: Simplified Lesk Algorithm is applied to find the actual sense of the ambiguous word, taking the derived POS into account. Step 3: Stop. 34
  • 7. International Journal of Instrumentation and Control Systems (IJICS) Vol.3, No.4, October 2013 Module 1: POS Tagging using Brown Corpus. Sentence, containing the ambiguous word is taken. Bigram approximation of the ambiguous word is found using the XML data source of the Brown Corpus. POS of the ambiguous word is derived. Figure 2. Implementation detail of Module 1 for POS Tagging Module 1: Algorithm 2: This algorithm (refer Figure 2) finds the POS of the ambiguous word using Brown Corpus. The maximum Time Complexity of the algorithm is O(n2), which is evaluated at step 2. Input: Sentence, containing the ambiguous word. Output: POS of the ambiguous word. Step 1: Input sentence, containing the ambiguous word is taken. Step 2: Bigram approximation of the ambiguous word is found using the XML data source of the Brown Corpus. Step 3: POS of the ambiguous word is derived from the largest approximation value. Step 4: Stop. 35
  • 8. International Journal of Instrumentation and Control Systems (IJICS) Vol.3, No.4, October 2013 Module 2: WSD using Simplified Lesk Algorithm The ambiguous word is taken. Only those dictionary definitions (glosses) are considered for WSD, which belong to the same POS domain w. r. t. to the POS of the ambiguous word. Overlaps are encountered between the glosses and the input sentence. Maximum number of overlaps for an instance represents the disambiguated sense of the ambiguous word Derived sense of the ambiguous word is represented as output Figure 3. Implementation detail of Module 2 for WSD procedure Module 2: Algorithm3: This algorithm (refer Figure 3) derives the actual sense of an ambiguous word using the Simplified Lesk Algorithm. Time Complexity of the algorithm is O(n3), as finding the total number of overlaps between a particular gloss and the input sentence is of O(n2) complexity and this procedure is performed for all the n number of glosses. Input: Ambiguous word with derived POS. Output: Disambiguated sense of the ambiguous word. Step 1: The ambiguous word is taken. Step 2: Only those dictionary definitions (glosses) are considered from WordNet, which belong to the same POS domain w. r. t. to the POS of the ambiguous word. Step 3: Overlaps are encountered between the glosses and the input sentence itself. Step 4: The actual sense of the ambiguous word is derived from the maximum number of overlaps for an instance. Step 5: Stop. 36
  • 9. International Journal of Instrumentation and Control Systems (IJICS) Vol.3, No.4, October 2013 The proposed approach gives better result regarding the execution time and the accuracy of the result, which is described in the next section. 5. OUTPUT AND DISCUSSION The algorithm is tested on more than 100 texts of different lengths and categories. Average length of the texts is of 200 sentences and two ambiguous words are selected for testing, "Bank" and "Plant". Next, the Simplified Lesk Algorithm is applied on the input text, containing the POS-tagged ambiguous word. As the POS of the ambiguous word is derived earlier, only those dictionary definitions are selected from WordNet for WSD process, which belong to the same POS domain w. r. t. the POS of the ambiguous word. As a result, the execution time of the WSD process becomes less (refer Table 5). It is also observed that, as the relevant glosses are considered for the WSD process, accuracy of the disambiguated sense is increased (refer Text no. 10). Some of the results for target word "Bank" are given in Table 5. All the sample texts are taken from "www.wikipedia.com". Table 5. Speed up analysis of WSD procedure for target word "Bank". 37
  • 10. International Journal of Instrumentation and Control Systems (IJICS) Vol.3, No.4, October 2013 Note 1: D-sense means Disambiguated sense, E-time means Execution time, ms means Mille Second. In the following sample test (Text no. 10), it is depicted that, in the given input text the ambiguous word "Plant" carries the actual sense (decided by human) as "Living Organism", which is in noun sense, but when the algorithm runs without POS Tagging, it derives the sense "Contact", which is in verb sense and obviously it is a wrong sense for this context according to human decision. This text is also taken from "www.wikipedia.com". Accuracy measurement of WSD procedure using the proposed approach is described below with a sample text. Text no. 10: Plants, also called green plants, are living organisms of the kingdom Plantae including such multi cellular groups as flowering plants, conifers, ferns and mosses, as well as, depending on definition, the green algae, but not red or brown seaweeds like kelp, nor fungi or bacteria. Green plants have cell walls with cellulose and characteristically obtain most of their energy from sunlight via photosynthesis using chlorophyll contained in chloroplasts, which gives them their green color. Some plants are parasitic and may not produce normal amounts of chlorophyll or photosynthesize. Plants are also characterized by sexual reproduction, modular and indeterminate growth, and an alternation of generations, although asexual reproduction is common, and some plants bloom only once while others bears only one bloom. Precise numbers are difficult to determine, but as of 2010, there are thought to be 300–315 thousand species of plants, of which the great majority, some 260–290 thousand, are seed plants. Green plants provide most of the world's molecular oxygen and are the basis of most of the earth's ecologies, especially on land. Plants described as grains, fruits and vegetables form mankind's basic foodstuffs, and have been domesticated for millennia. Plants enrich our lives as flowers and ornaments. Until recently and in great variety they have served as the source of most of our medicines and drugs. Their scientific study is known as botany. Output: Target word: Plant. Actual sense: Living Organism (Noun). Derived sense with POS Tagging: Living Organism (Noun). Derived sense without POS Tagging: Contact (Verb). 6. CONCLUSION AND FUTURE WORK The proposed approach speeds up the WSD procedure by filtering the only relevant glosses and increases the accuracy of the WSD procedure as well. The execution time differences between the two cases (with and without POS Tagging procedure) might be increased, if few system calls and other system related tasks are handled properly. The obvious operations (loop, memory allocation, condition check, function call etc.) for POS Tagging took some time, which is included in the cited result. Otherwise, the actual time difference could have been better. 38
  • 11. International Journal of Instrumentation and Control Systems (IJICS) Vol.3, No.4, October 2013 REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] R. S. Cucerzan, C. Schafer, D. Yarowsky, “Combining classifiers for word sense disambiguation, Natural Language Engineering,” Vol. 8, No. 4, 2002, Cambridge University Press, pp. 327-341. M. S. Nameh , M. Fakhrahmad, M. Z. Jahromi, “A New Approach to Word Sense Disambiguation Based on Context Similarity,” Proceedings of the World Congress on Engineering, Vol. I, 2011. Gaizauskas, “Gold Standard Datasets for Evaluating Word Sense Disambiguation Programs,” Computer Speech and Language, Vol. 12, No. 3, Special Issue on Evaluation of Speech and Language Technology, pp. 453-472. R. Navigli, “Word Sense Disambiguation: a Survey,” ACM Computing Surveys, Vol. 41, No. 2, 2009, ACM Press, pp. 1-69 N. Ide , J. Véronis, “ Word Sense Disambiguation: The State of the Art,” Computational Linguistics, Vol. 24, No. 1, 1998, pp. 1-40. W. Xiaojie, Y. Matsumoto, “Chinese word sense disambiguation by combining pseudo training data,” Proceedings of The International Conference on Natural Language Processing and Knowledge Engineering, 2003, pp. 138-143. C. Santamaria, J Gonzalo, F. Verdejo,”Automatic Association of WWW Directories to Word Senses,” Computational Linguistics, Vol. 3, Issue 3, Special Issue on the Web as Corpus, 2003, pp. 485-502. J. Heflin, J. Hendler, “A Portrait of the Semantic Web in Action,” IEEE Intelligent Systems, Vol. 16, No. 2, 2001, pp. 54-59. S. G. Kolte, S. G. Bhirud,” Word Sense Disambiguation Using WordNet Domains,” First International Conference on Digital Object Identifier, 2008, pp. 1187-1191. G. Miller: WordNet: An on-line lexical database, International Journal of Lexicography, Vol. 3, No. 4, 1991 Y. Liu, P. Scheuermann, X. Li, X. Zhu, “Using WordNet to Disambiguate Word Senses for Text Classification,” Proceedings of the 7th International Conference on Computational Science, Springer-Verlag, 2007, pp. 781 – 789. A. J. Cañas, A. Valerio, J. Lalinde-Pulido, M. Carvalho, M. Arguedas,”Using WordNet for Word Sense Disambiguation to Support Concept Map Construction,” String Processing and Information Retrieval, 2003, pp. 350-359. G. A. Miller, “WordNet: A Lexical Database,” Comm. ACM, Vol. 38, No. 11, 1993, pp. 39-41. H. Seo, H. Chung, H. Rim, H. Myaeng, S. Kim, “Unsupervised word sense disambiguation using WordNet relatives,” Computer Speech and Language, Vol. 18, No. 3, 2004, pp. 253-273. G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross, K. J. Miller, “WordNet An on-line lexical database,” International Journal of Lexicography, Vol. 3, No. 4, 1990, pp. 235-244. S. Banerjee, T. Pedersen, “An adapted Lesk algorithm for word sense disambiguation using WordNet,” In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics, Mexico City, February, 2002. M. Lesk, “Automatic Sense Disambiguation Using Machine Readable Dictionaries: How to Tell a Pine Cone from an Ice Cream Cone,” Proceedings of SIGDOC, 1986. S. Dandapat, “Part Of Specch Tagging and Chunking with Maximum Entropy Model,” in Proceedings of the IJCAI Workshop on Shallow Parsing for South Asian Languages, (Hyderabad, India), pp. 29–32, 2007. A. Ekbal, R. Haque, and S. Bandyopadhyay, ”Maximum Entropy based Bengali Part of Speech Tagging,” in A. Gelbukh (Ed.), Advances in Natural Language Processing and Applications, Research in Computing Science (RCS) Journal, vol. 33, pp. 67–78. 39
  • 12. International Journal of Instrumentation and Control Systems (IJICS) Vol.3, No.4, October 2013 [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] A. Ekbal, R. Haque, and S. Bandyopadhyay, “Bengali Part of Speech Tagging using Conditional Random Field,” in Proceedings of the seventh International Symposium on Natural Language Processing, SNLP-2007, 2007. A. Ratnaparkhi, “A maximum entropy part-of -speech tagger,” in Proc. of EMNLP’96., 1996. PVS. Avinesh, G. Karthik, “Part Of Speech Tagging and Chunking using Conditional Random Fields and Transformation Based Learning,” In Proc. of SPSAL 2007, IJCAI, India, 21-24. T. Brants, “TnT-A Statistical Part of Speech Tagger,” In Proc. of the 6th ANLP Conference, 224231. D. Cutting, J. Kupiec, J. Pederson and P. Sibun,”A Practical Part of Speech Tagger,” In Proc. of the 3rd ANLP Conference, 133-140, 1992. A. Ekbal, and S. Bandyopadhyay, “Lexicon Development and POS tagging using a Tagged Bengali News Corpus. In Proc. of FLAIRS-2007, Florida, 261-263. M. Mcteer, R. Schwartz and R. Weischedel, “Empirical studies in part-of-speech labeling. Proceedings Of the 4th DARPA Workshop on Speech and Natural Language,” pp. 331-336, 1991. B. Merialdo, “Tagging English text with a probabilistic model,” Computational Linguistics, 20(2):155-171, 1994. M. J. Beal, Z. Ghahramani, and C. E. Rasmussen, “The infinite hidden markov model,” Advances in Neural Information Processing Systems, 14:577 –584, 2002. Chris Biemann, Claudio Giuliano, and Alfio Gliozzo,”Unsupervised part-of-speech tagging supporting supervised methods,” In Proceedings of RANLP. Eric Brill. 1994. Some advances in transformationbased part of speech tagging. In National Conference on Artificial Intelligence, pages 722–727, 2007. J. Van Gael, Y. Saatci, Y. W. Teh, and Z. Ghahramani, “Beam sampling for the infinite hidden markov model,” In Proceedings of the 25th international conference on Machine learning, volume 25, Helsinki, 2007. Jianfeng Gao and Mark Johnson, “A comparison of bayesian estimators for unsupervised hidden markov model pos taggers,” In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 344–352. Daniel Jurafsky and James H. Martin. Speech and Language Processing, Prentice Hall, 2000. 40
  • 13. International Journal of Instrumentation and Control Systems (IJICS) Vol.3, No.4, October 2013 Authors Alok Ranjan Pal has been working as an a Assistant Professor in Computer Science and Engineering Department of College of Engineering and Management, Kolaghat since 2006. He has completed his Bachelor's and Master's degree under WBUT. Now, he is working on Natural Language Processing Mr. Anupam Munshi is a student of Information Technology Department of College of Engineering and Management, Kolaghat. His field of interest is AI, Soft Computing and NLP. Dr. Diganta Saha is an Associate Professor in Department of Computer Science & Engineering, Jadavpur University. His field of specialization is Machine Translation/ Natural Language Processing/ Mobile Computing/ Pattern Classification. 41