SlideShare una empresa de Scribd logo
1 de 29
Descargar para leer sin conexión
Word Level Analysis
POS (Continued)
L3 NLP (Elective)
1
TJS
Words and Word Classes
 Words are classified into categories called
part of speech (word classes or lexical
categories)
2
Part of Speech
NN noun student, chair, proof, mechanism
VB verb study, increase, produce
ADJ adj large, high, tall, few,
JJ adverb carefully slowly, uniformly
3
JJ adverb carefully slowly, uniformly
IN preposition in, on, to, of
PRP pronoun I, me, they
DET determiner the, a, an, this, those
 open vs. closed word classes
Part of Speech tagging
 process of assigning a part of speech like noun,
verb, pronoun, preposition, adverb, adjective,
etc. to each word in a sentence
4
POS tagger
Words
+
tag set
POS
tag
Speech/NN sounds/NNS were/VBD sampled/VBN
by/IN a/DT microphone/NN.
Another tagging possible for the sentence is:
5
Speech/NN sounds/VBZ were/VBD sampled/VBN
by/IN a/DT microphone/NN.
Part of speech tagging
methods
 Rule-based (linguistic)
 Stochastic (Data-driven) and
 TBL (Transformation Based Learning)
6
 TBL (Transformation Based Learning)
Rule-based (linguistic)
Steps:
1. Dictionary lookup  potential tags
2. Hand-coded Rules
The show must go on.
7
The show must go on.
Step 1  NN, VB
Step 2  discard incorrect tag
Rule: IF preceding word is determiner THEN
eliminate VB tag.
 Morphological information
IF word ends in –ing and preceding word is a verb
THEN label it a verb (VB).
8
 Capitalization information
+
 Speed
 Deterministic
9
 Deterministic
-
 requires manual work
 usable for only one language
Stochastic Tagger
 The standard stochastic tagger algorithm is the
Hidden Markov Model (HMM) tagger.
 A Markov model applies the simplifying
assumption that the probability of a chain of
assumption that the probability of a chain of
symbols can be approximated in terms of its
parts or n-grams.
 The simplest n-gram model is the unigram
model, which assigns the most likely tag (part-of-
speech) to each token.
10
 The unigram model requires tagged data to
gather most likely statistics. The context used by
the unigram tagger is the text of the word itself.
For example, it will assign the tag JJ for each
occurrence of fast if fast is used as an adjective
more frequently than it is used as a noun, verb,
or adverb.
or adverb.
 She had a fast
 Muslim fast during Ramadan
 Those who are injured need medical help fast.
 We would expect more accurate predictions if we
took more context into account when making a
tagging decision.
11
 A bi-gram tagger uses the current word and the
tag of the previous word in the tagging process.
As the tag sequence “DT NN” is more likely than
the tag sequence “DT JJ”, a bi-gram model will
assign a correct tag to the word fast in sentence
assign a correct tag to the word fast in sentence
(1).
 Similarly, it is more likely that an adverb (rather
than a noun or an adjective) follows a verb.
Hence, in sentence (3), the tag assigned to fast
will be RB (Adverb)
12
N- gram Model
 An n-gram model considers the current word
and the tag of the previous n-1 words in
assigning a tag to a word.
Fig. Context used by Tri-gram Model
13
HMM Tagger
 Given a sequence of words (sentence), the
objective is to find the most probable tag sequence
for the sentence.
 Let W be the sequence of words:
 Let W be the sequence of words:
W = w1, w2, … , wn
 The task is to find the tag sequence
T = t1, t2, … , tn
which maximizes P(T|W), i.e.,
T’ = argmaxT P(T|W)
14
 Applying Bayes Rule, P(T/W) can be
estimated using the expression:
P(T|W) = P(W|T) * P(T) /P(W)
 As the probability of the word sequence,
P(W), remains the same for each tag
sequence, we can drop it. The expression
for the most likely tag sequence becomes:
T’ = argmaxT P(W|T) * P(T)
15
 Using the Markov assumption, the probability of
a tag sequence can be estimated as the product
of the probability of its constituent n-grams, i.e.,
P(T) = P(t1) * P(t2|t1) * P(t3|t1t2) … * P(tn|t1 …
 P(T) = P(t1) * P(t2|t1) * P(t3|t1t2) … * P(tn|t1 …
tn-1)
 P(W/T) is the probability of seeing a word
sequence, given a tag sequence
 For ex, what is the probability of seeing ‘The egg
is rotten’ given ‘DT NNP VB JJ’.
16
 We make following two assumptions :
1. The words are independent of each other, and
2. The probability of a word is dependent only on
2. The probability of a word is dependent only on
its tag.
Using these assumptions P(W/T) can be expr :
P(W/T) = P(w1/t1) * P(w2/t2) .... P(wi/ti) *
...P(Wn/tn)
i,.e.,
17
18
19
 Some of the possible tag sequence:
DT NNP NNP NNP
DT NNP MD VB or DT NNP MD NNP (Output  Most likely)
20
21
Brill Tagger: Initial state
 Initial State:
most likely tag
 Transformation:
22
 Transformation:
The text is then passed through an ordered
list of transformations.
Each transformation is a pair of a a rewrite
rule and a contextual condition .
Learning Rules
Rules are learned in the following manner
1. each rule, i.e. each possible transformation, is
applied to each matching word-tag-pair.
2. the number of tagging errors is measured
against the correct sequences of the training
corpus ("Truth" ).
23
corpus ("Truth" ).
3. the transformation which yields the greatest
error reduction is chosen.
4. Learning stops when no rules / transformations
can be found that, if applied, reduces errors
beyond some given threshold.
• Set of possible ‘transforms’ is infinite, e.g.,
“transform NN to VB if the previous word
was MicrosoftWindoze & word braindead
occurs between 17 and 158 words before
24
occurs between 17 and 158 words before
that”
• To limit: start with small set of abstracted
transforms, or templates
Templates used: Change a
to b when…
25
Rules learned by TBL tagger
26
Lexicalized transformations
Brill complements the rule schemes by so-called
lexicalized rules which refer to particular words in
the condition part of the transformation:
Change a to b if
27
 Change a to b if
1. the preceding (following, current) word is C
2. the preceding (following, current) word is C and
the preceding (following) word is tagged d.
etc.
unknown words
 In handling unknown words, a POS-tagger can
adopt the following strategies
 assign all possible tags to the unknown word
 assign the most probable tag to the unknown
word
28
 same distribution as ‘Things seen once’
estimator of ‘things never seen’
 use word features i.e. see how words are
spelled (prefixes, suffixes, word length,
capitalization) to guess a (set of) word
class(es). -- Most powerful
Most powerful unknown word
detectors
 32 derivational endings ( -ion,etc.);
 capitalization; hyphenation
 More generally: should use morphological
29
 More generally: should use morphological
analysis! (and some kind of machine learning
approach)

Más contenido relacionado

La actualidad más candente

NLP pipeline in machine translation
NLP pipeline in machine translationNLP pipeline in machine translation
NLP pipeline in machine translationMarcis Pinnis
 
Natural Language Processing in AI
Natural Language Processing in AINatural Language Processing in AI
Natural Language Processing in AISaurav Shrestha
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingMariana Soffer
 
Emotion Detection in text
Emotion Detection in text Emotion Detection in text
Emotion Detection in text kashif kashif
 
Text summarization
Text summarizationText summarization
Text summarizationkareemhashem
 
A Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsA Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsBhaskar Mitra
 
Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translationHrishikesh Nair
 
natural language processing help at myassignmenthelp.net
natural language processing  help at myassignmenthelp.netnatural language processing  help at myassignmenthelp.net
natural language processing help at myassignmenthelp.netwww.myassignmenthelp.net
 
Challenges in nlp
Challenges in nlpChallenges in nlp
Challenges in nlpZareen Syed
 
detect emotion from text
detect emotion from textdetect emotion from text
detect emotion from textSafayet Hossain
 
DESIGN AND ANALYSIS OF ALGORITHMS
DESIGN AND ANALYSIS OF ALGORITHMSDESIGN AND ANALYSIS OF ALGORITHMS
DESIGN AND ANALYSIS OF ALGORITHMSGayathri Gaayu
 
Lecture 3: Semantic Role Labelling
Lecture 3: Semantic Role LabellingLecture 3: Semantic Role Labelling
Lecture 3: Semantic Role LabellingMarina Santini
 
Multimedia system, Architecture & Databases
Multimedia system, Architecture & DatabasesMultimedia system, Architecture & Databases
Multimedia system, Architecture & DatabasesHarshita Ved
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingYasir Khan
 

La actualidad más candente (20)

Semantic analysis
Semantic analysisSemantic analysis
Semantic analysis
 
NLP pipeline in machine translation
NLP pipeline in machine translationNLP pipeline in machine translation
NLP pipeline in machine translation
 
Natural Language Processing in AI
Natural Language Processing in AINatural Language Processing in AI
Natural Language Processing in AI
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Nlp ambiguity presentation
Nlp ambiguity presentationNlp ambiguity presentation
Nlp ambiguity presentation
 
Emotion Detection in text
Emotion Detection in text Emotion Detection in text
Emotion Detection in text
 
Text summarization
Text summarizationText summarization
Text summarization
 
A Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsA Simple Introduction to Word Embeddings
A Simple Introduction to Word Embeddings
 
Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translation
 
natural language processing help at myassignmenthelp.net
natural language processing  help at myassignmenthelp.netnatural language processing  help at myassignmenthelp.net
natural language processing help at myassignmenthelp.net
 
Challenges in nlp
Challenges in nlpChallenges in nlp
Challenges in nlp
 
Nlp
NlpNlp
Nlp
 
Nlp
NlpNlp
Nlp
 
detect emotion from text
detect emotion from textdetect emotion from text
detect emotion from text
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Text summarization
Text summarizationText summarization
Text summarization
 
DESIGN AND ANALYSIS OF ALGORITHMS
DESIGN AND ANALYSIS OF ALGORITHMSDESIGN AND ANALYSIS OF ALGORITHMS
DESIGN AND ANALYSIS OF ALGORITHMS
 
Lecture 3: Semantic Role Labelling
Lecture 3: Semantic Role LabellingLecture 3: Semantic Role Labelling
Lecture 3: Semantic Role Labelling
 
Multimedia system, Architecture & Databases
Multimedia system, Architecture & DatabasesMultimedia system, Architecture & Databases
Multimedia system, Architecture & Databases
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 

Similar a word level analysis

2015ht13439 final presentation
2015ht13439 final presentation2015ht13439 final presentation
2015ht13439 final presentationAshutosh Kumar
 
Recommender systems
Recommender systemsRecommender systems
Recommender systemsVenkat Raman
 
A survey of Stemming Algorithms for Information Retrieval
A survey of Stemming Algorithms for Information RetrievalA survey of Stemming Algorithms for Information Retrieval
A survey of Stemming Algorithms for Information Retrievaliosrjce
 
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...NAIST Machine Translation Study Group
 
Dictionaries and Tolerant Retrieval.ppt
Dictionaries and Tolerant Retrieval.pptDictionaries and Tolerant Retrieval.ppt
Dictionaries and Tolerant Retrieval.pptManimaran A
 
Ijarcet vol-2-issue-2-323-329
Ijarcet vol-2-issue-2-323-329Ijarcet vol-2-issue-2-323-329
Ijarcet vol-2-issue-2-323-329Editor IJARCET
 
Learning to summarize using coherence
Learning to summarize using coherenceLearning to summarize using coherence
Learning to summarize using coherenceContent Savvy
 
7 probability and statistics an introduction
7 probability and statistics an introduction7 probability and statistics an introduction
7 probability and statistics an introductionThennarasuSakkan
 
An approach to speed up the word sense disambiguation procedure through sense...
An approach to speed up the word sense disambiguation procedure through sense...An approach to speed up the word sense disambiguation procedure through sense...
An approach to speed up the word sense disambiguation procedure through sense...ijics
 
An Intuitive Natural Language Understanding System
An Intuitive Natural Language Understanding SystemAn Intuitive Natural Language Understanding System
An Intuitive Natural Language Understanding Systeminscit2006
 
Extractive Document Summarization - An Unsupervised Approach
Extractive Document Summarization - An Unsupervised ApproachExtractive Document Summarization - An Unsupervised Approach
Extractive Document Summarization - An Unsupervised ApproachFindwise
 
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATIONAN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATIONcscpconf
 
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIES
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIESA REVIEW ON PARTS-OF-SPEECH TECHNOLOGIES
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIESIJCSES Journal
 
Lecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document ParsingLecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document ParsingSean Golliher
 

Similar a word level analysis (20)

2015ht13439 final presentation
2015ht13439 final presentation2015ht13439 final presentation
2015ht13439 final presentation
 
Text Mining Analytics 101
Text Mining Analytics 101Text Mining Analytics 101
Text Mining Analytics 101
 
Recommender systems
Recommender systemsRecommender systems
Recommender systems
 
K017367680
K017367680K017367680
K017367680
 
A survey of Stemming Algorithms for Information Retrieval
A survey of Stemming Algorithms for Information RetrievalA survey of Stemming Algorithms for Information Retrieval
A survey of Stemming Algorithms for Information Retrieval
 
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
 
Dictionaries and Tolerant Retrieval.ppt
Dictionaries and Tolerant Retrieval.pptDictionaries and Tolerant Retrieval.ppt
Dictionaries and Tolerant Retrieval.ppt
 
Ijarcet vol-2-issue-2-323-329
Ijarcet vol-2-issue-2-323-329Ijarcet vol-2-issue-2-323-329
Ijarcet vol-2-issue-2-323-329
 
Learning to summarize using coherence
Learning to summarize using coherenceLearning to summarize using coherence
Learning to summarize using coherence
 
Cc35451454
Cc35451454Cc35451454
Cc35451454
 
A-Study_TopicModeling
A-Study_TopicModelingA-Study_TopicModeling
A-Study_TopicModeling
 
7 probability and statistics an introduction
7 probability and statistics an introduction7 probability and statistics an introduction
7 probability and statistics an introduction
 
An approach to speed up the word sense disambiguation procedure through sense...
An approach to speed up the word sense disambiguation procedure through sense...An approach to speed up the word sense disambiguation procedure through sense...
An approach to speed up the word sense disambiguation procedure through sense...
 
UNIT 3 IRT.docx
UNIT 3 IRT.docxUNIT 3 IRT.docx
UNIT 3 IRT.docx
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
An Intuitive Natural Language Understanding System
An Intuitive Natural Language Understanding SystemAn Intuitive Natural Language Understanding System
An Intuitive Natural Language Understanding System
 
Extractive Document Summarization - An Unsupervised Approach
Extractive Document Summarization - An Unsupervised ApproachExtractive Document Summarization - An Unsupervised Approach
Extractive Document Summarization - An Unsupervised Approach
 
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATIONAN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
 
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIES
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIESA REVIEW ON PARTS-OF-SPEECH TECHNOLOGIES
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIES
 
Lecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document ParsingLecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document Parsing
 

Último

Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptNANDHAKUMARA10
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf203318pmpc
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...tanu pandey
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoordharasingh5698
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 

Último (20)

Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 

word level analysis

  • 1. Word Level Analysis POS (Continued) L3 NLP (Elective) 1 TJS
  • 2. Words and Word Classes  Words are classified into categories called part of speech (word classes or lexical categories) 2
  • 3. Part of Speech NN noun student, chair, proof, mechanism VB verb study, increase, produce ADJ adj large, high, tall, few, JJ adverb carefully slowly, uniformly 3 JJ adverb carefully slowly, uniformly IN preposition in, on, to, of PRP pronoun I, me, they DET determiner the, a, an, this, those  open vs. closed word classes
  • 4. Part of Speech tagging  process of assigning a part of speech like noun, verb, pronoun, preposition, adverb, adjective, etc. to each word in a sentence 4 POS tagger Words + tag set POS tag
  • 5. Speech/NN sounds/NNS were/VBD sampled/VBN by/IN a/DT microphone/NN. Another tagging possible for the sentence is: 5 Speech/NN sounds/VBZ were/VBD sampled/VBN by/IN a/DT microphone/NN.
  • 6. Part of speech tagging methods  Rule-based (linguistic)  Stochastic (Data-driven) and  TBL (Transformation Based Learning) 6  TBL (Transformation Based Learning)
  • 7. Rule-based (linguistic) Steps: 1. Dictionary lookup  potential tags 2. Hand-coded Rules The show must go on. 7 The show must go on. Step 1  NN, VB Step 2  discard incorrect tag Rule: IF preceding word is determiner THEN eliminate VB tag.
  • 8.  Morphological information IF word ends in –ing and preceding word is a verb THEN label it a verb (VB). 8  Capitalization information
  • 9. +  Speed  Deterministic 9  Deterministic -  requires manual work  usable for only one language
  • 10. Stochastic Tagger  The standard stochastic tagger algorithm is the Hidden Markov Model (HMM) tagger.  A Markov model applies the simplifying assumption that the probability of a chain of assumption that the probability of a chain of symbols can be approximated in terms of its parts or n-grams.  The simplest n-gram model is the unigram model, which assigns the most likely tag (part-of- speech) to each token. 10
  • 11.  The unigram model requires tagged data to gather most likely statistics. The context used by the unigram tagger is the text of the word itself. For example, it will assign the tag JJ for each occurrence of fast if fast is used as an adjective more frequently than it is used as a noun, verb, or adverb. or adverb.  She had a fast  Muslim fast during Ramadan  Those who are injured need medical help fast.  We would expect more accurate predictions if we took more context into account when making a tagging decision. 11
  • 12.  A bi-gram tagger uses the current word and the tag of the previous word in the tagging process. As the tag sequence “DT NN” is more likely than the tag sequence “DT JJ”, a bi-gram model will assign a correct tag to the word fast in sentence assign a correct tag to the word fast in sentence (1).  Similarly, it is more likely that an adverb (rather than a noun or an adjective) follows a verb. Hence, in sentence (3), the tag assigned to fast will be RB (Adverb) 12
  • 13. N- gram Model  An n-gram model considers the current word and the tag of the previous n-1 words in assigning a tag to a word. Fig. Context used by Tri-gram Model 13
  • 14. HMM Tagger  Given a sequence of words (sentence), the objective is to find the most probable tag sequence for the sentence.  Let W be the sequence of words:  Let W be the sequence of words: W = w1, w2, … , wn  The task is to find the tag sequence T = t1, t2, … , tn which maximizes P(T|W), i.e., T’ = argmaxT P(T|W) 14
  • 15.  Applying Bayes Rule, P(T/W) can be estimated using the expression: P(T|W) = P(W|T) * P(T) /P(W)  As the probability of the word sequence, P(W), remains the same for each tag sequence, we can drop it. The expression for the most likely tag sequence becomes: T’ = argmaxT P(W|T) * P(T) 15
  • 16.  Using the Markov assumption, the probability of a tag sequence can be estimated as the product of the probability of its constituent n-grams, i.e., P(T) = P(t1) * P(t2|t1) * P(t3|t1t2) … * P(tn|t1 …  P(T) = P(t1) * P(t2|t1) * P(t3|t1t2) … * P(tn|t1 … tn-1)  P(W/T) is the probability of seeing a word sequence, given a tag sequence  For ex, what is the probability of seeing ‘The egg is rotten’ given ‘DT NNP VB JJ’. 16
  • 17.  We make following two assumptions : 1. The words are independent of each other, and 2. The probability of a word is dependent only on 2. The probability of a word is dependent only on its tag. Using these assumptions P(W/T) can be expr : P(W/T) = P(w1/t1) * P(w2/t2) .... P(wi/ti) * ...P(Wn/tn) i,.e., 17
  • 18. 18
  • 19. 19
  • 20.  Some of the possible tag sequence: DT NNP NNP NNP DT NNP MD VB or DT NNP MD NNP (Output  Most likely) 20
  • 21. 21
  • 22. Brill Tagger: Initial state  Initial State: most likely tag  Transformation: 22  Transformation: The text is then passed through an ordered list of transformations. Each transformation is a pair of a a rewrite rule and a contextual condition .
  • 23. Learning Rules Rules are learned in the following manner 1. each rule, i.e. each possible transformation, is applied to each matching word-tag-pair. 2. the number of tagging errors is measured against the correct sequences of the training corpus ("Truth" ). 23 corpus ("Truth" ). 3. the transformation which yields the greatest error reduction is chosen. 4. Learning stops when no rules / transformations can be found that, if applied, reduces errors beyond some given threshold.
  • 24. • Set of possible ‘transforms’ is infinite, e.g., “transform NN to VB if the previous word was MicrosoftWindoze & word braindead occurs between 17 and 158 words before 24 occurs between 17 and 158 words before that” • To limit: start with small set of abstracted transforms, or templates
  • 25. Templates used: Change a to b when… 25
  • 26. Rules learned by TBL tagger 26
  • 27. Lexicalized transformations Brill complements the rule schemes by so-called lexicalized rules which refer to particular words in the condition part of the transformation: Change a to b if 27  Change a to b if 1. the preceding (following, current) word is C 2. the preceding (following, current) word is C and the preceding (following) word is tagged d. etc.
  • 28. unknown words  In handling unknown words, a POS-tagger can adopt the following strategies  assign all possible tags to the unknown word  assign the most probable tag to the unknown word 28  same distribution as ‘Things seen once’ estimator of ‘things never seen’  use word features i.e. see how words are spelled (prefixes, suffixes, word length, capitalization) to guess a (set of) word class(es). -- Most powerful
  • 29. Most powerful unknown word detectors  32 derivational endings ( -ion,etc.);  capitalization; hyphenation  More generally: should use morphological 29  More generally: should use morphological analysis! (and some kind of machine learning approach)