SlideShare una empresa de Scribd logo
1 de 55
1
Morphology
September
2009
Lecture #4
2
What is Morphology?
• The study of how words are composed of
morphemes (the smallest meaning-bearing units of a
language)
– Stems – core meaning units in a lexicon
– Affixes (prefixes, suffixes, infixes, circumfixes) – bits and
pieces that combine with stems to modify their meanings
and grammatical functions (can have multiple ones)
• Immaterial
• Trying
• Absobl**dylutely
• agoing
• Unreadable
3
Why is Morphology Important to the
Lexicon?
Full listing versus Minimal Redundancy
• true, truer, truest, truly, untrue, truth, truthful,
truthfully, untruthfully, untruthfulness
• Untruthfulness = un- + true + -th + -ful + -ness
• These morphemes appear to be productive
• By representing knowledge about the internal
structure of words and the rules of word formation,
we can save room and search time.
4
Need to do Morphological Parsing
Morphological Parsing (or Stemming)
• Taking a surface input and breaking it down into its
morphemes
• foxes breaks down into the morphemes fox (noun
stem) and –es (plural suffix)
• rewrites breaks down into re- (prefix) and write (stem)
and –s (suffix)
5
Two Broad Classes of Morphology
• Inflectional Morphology
– Combination of stem and morpheme resulting in word of
same class
– Usually fills a syntactic feature such as agreement
– E.g., plural –s, past tense -ed
• Derivational Morphology
– Combination of stem and morpheme usually results in a
word of a different class
– Meaning of the new word may be hard to predict
– E.g., +ation in words such as computerization
6
Word Classes
• By word class, we have in mind familiar notions like
noun and verb that we discussed a bit in the previous
lecture.
• We’ll go into more details when we get to parsing
(Chapter 12).
• Right now we’re concerned with word classes
because the way that stems and affixes combine is
based to a large degree on the word class of the
stem.
7
English Inflectional Morphology
• Word stem combines with grammatical morpheme
– Usually produces word of same class
– Usually serves a syntactic function (e.g., agreement)
like  likes or liked
bird  birds
• Nominal morphology
– Plural forms
• s or es
• Irregular forms (next slide)
• Mass vs. count nouns (email or emails)
– Possessives
8
Complication in Morphology
• Ok so it gets a little complicated by the fact that some
words misbehave (refuse to follow the rules)
• The terms regular and irregular will be used to refer
to words that follow the rules and those that don’t.
Regular (Nouns)
• Singular (cat, thrush)
• Plural (cats, thrushes)
• Possessive (cat’s thrushes’)
Irregular (Nouns)
• Singular (mouse, ox)
• Plural (mice, oxen)
9
• Verbal inflection
– Main verbs (sleep, like, fear) are relatively regular
• -s, ing, ed
• And productive: Emailed, instant-messaged, faxed,
homered
• But eat/ate/eaten, catch/caught/caught
– Primary (be, have, do) and modal verbs (can, will, must) are
often irregular and not productive
• Be: am/is/are/were/was/been/being
– Irregular verbs few (~250) but frequently occurring
– English verbal inflection is much simpler than e.g. Latin
10
Regular and Irregular Verbs
• Regulars…
– Walk, walks, walking, walked, walked
• Irregulars
– Eat, eats, eating, ate, eaten
– Catch, catches, catching, caught, caught
– Cut, cuts, cutting, cut, cut
11
Derivational Morphology
• Derivational morphology is the messy stuff that no
one ever taught you.
– Quasi-systematicity
– Irregular meaning change
– Changes of word class
12
English Derivational Morphology
• Word stem combines with grammatical morpheme
– Usually produces a word of a different class
– More complicated than inflectional
• Example: nominalization
– -ize verbs  -ation nouns
– generalize, realize  generalization, realization
– verb  -er nouns
– Murder, spell  murderer, speller
• Example: verbs, nouns  adjectives
– embrace, pity embraceable, pitiable
– care, wit  careless, witless
13
• Example: adjective  adverb
– happy  happily
• More complicated to model than inflection
– Less productive: *science-less, *concern-less, *go-able,
*sleep-able
– Meanings of derived terms harder to predict by rule
• clueless, careless, nerveless
14
Derivational Examples
• Verb/Adj to Noun
-ation computerize computerization
-ee appoint appointee
-er kill killer
-ness fuzzy fuzziness
15
Derivational Examples
• Noun/Verb to Adj
-al Computation Computational
-able Embrace Embraceable
-less Clue Clueless
16
Compute
• Many paths are possible…
• Start with compute
– Computer -> computerize -> computerization
– Computation -> computational
– Computer -> computerize -> computerizable
– Compute -> computee
17
How do people represent words?
• Hypotheses:
– Full listing hypothesis: words listed
– Minimum redundancy hypothesis: morphemes listed
• Experimental evidence:
– Priming experiments (Does seeing/hearing one word
facilitate recognition of another?) suggest neither
• Regularly inflected forms prime stem but not derived forms
• But spoken derived words can prime stems if they are
semantically close (e.g. government/govern but not
department/depart)
18
• Speech errors suggest affixes must be represented
separately in the mental lexicon
– easy enoughly
19
Parsing
• Taking a surface input and identifying its components
and underlying structure
• Morphological parsing: parsing a word into stem and
affixes and identifying the parts and their
relationships
– Stem and features:
• goose  goose +N +SG or goose +V
• geese  goose +N +PL
• gooses  goose +V +3SG
– Bracketing: indecipherable  [in [[de [cipher]] able]]
20
Why parse words?
• For spell-checking
– Is muncheble a legal word?
• To identify a word’s part-of-speech (pos)
– For sentence parsing, for machine translation, …
• To identify a word’s stem
– For information retrieval
21
What do we need to build a
morphological parser?
• Lexicon: stems and affixes (w/ corresponding pos)
• Morphotactics of the language: model of how
morphemes can be affixed to a stem. E.g., plural
morpheme follows noun in English
• Orthographic rules: spelling modifications that occur
when affixation occurs
– in  il in context of l (in- + legal)
22
Morphotactic Models
• English nominal inflection
q0 q2
q1
plural (-s)
reg-n
irreg-sg-n
irreg-pl-n
•Inputs: cats, goose, geese
23
Antworth data on English Adjectives
• Big, bigger, biggest
• Cool, cooler, coolest, cooly
• Red, redder, reddest
• Clear, clearer, clearest, clearly, unclear, unclearly
• Happy, happier, happiest, happily
• Unhappy, unhappier, unhappiest, unhappily
• Real, unreal, really
24
Antworth data on English Adjectives
• Big, bigger, biggest
• Cool, cooler, coolest, cooly
• Red, redder, reddest
• Clear, clearer, clearest, clearly, unclear, unclearly
• Happy, happier, happiest, happily
• Unhappy, unhappier, unhappiest, unhappily
• Real, unreal, really
25
Antworth data on English Adjectives
• Big, bigger, biggest
• Cool, cooler, coolest, cooly
• Red, redder, reddest
• Clear, clearer, clearest, clearly, unclear, unclearly
• Happy, happier, happiest, happily
• Unhappy, unhappier, unhappiest, unhappily
• Real, unreal, really
26
Antworth data on English Adjectives
• Big, bigger, biggest
• Cool, cooler, coolest, cooly
• Red, redder, reddest
• Clear, clearer, clearest, clearly, unclear, unclearly
• Happy, happier, happiest, happily
• Unhappy, unhappier, unhappiest, unhappily
• Real, unreal, really
27
Antworth data on English Adjectives
• Big, bigger, biggest
• Cool, cooler, coolest, cooly
• Red, redder, reddest
• Clear, clearer, clearest, clearly, unclear, unclearly
• Happy, happier, happiest, happily
• Unhappy, unhappier, unhappiest, unhappily
• Real, unreal, really
28
Antworth data on English Adjectives
• Big, bigger, biggest
• Cool, cooler, coolest, cooly
• Red, redder, reddest
• Clear, clearer, clearest, clearly, unclear, unclearly
• Happy, happier, happiest, happily
• Unhappy, unhappier, unhappiest, unhappily
• Real, unreal, really
29
• Derivational morphology: adjective fragment
q5
q0
q1 q2
un-
adj-root
-er, -ly, -est

• Adj-root: clear, happy, real, big, red
30
• Derivational morphology: adjective fragment
q5
q0
q1 q2
un-
adj-root
-er, -ly, -est

• Adj-root: clear, happy, real, big, red
•BUT: unbig, redly, realest
31
Antworth data on English Adjectives
• Big, bigger, biggest
• Cool, cooler, coolest, cooly
• Red, redder, reddest
• Clear, clearer, clearest, clearly, unclear, unclearly
• Happy, happier, happiest, happily
• Unhappy, unhappier, unhappiest, unhappily
• Real, unreal, really
32
Antworth data on English Adjectives
• Big, bigger, biggest
• Cool, cooler, coolest, cooly
• Red, redder, reddest
• Clear, clearer, clearest, clearly, unclear, unclearly
• Happy, happier, happiest, happily
• Unhappy, unhappier, unhappiest, unhappily
• Real, unreal, really
33
• Derivational morphology: adjective fragment
q3
q5
q4
q0
q1 q2
un-
adj-root1
-er, -ly, -est

adj-root1
adj-root2
-er, -est
• Adj-root1: clear, happy, real
• Adj-root2: big, red
34
FSAs and the Lexicon
• First we’ll capture the morphotactics
– The rules governing the ordering of affixes in a language.
• Then we’ll add in the actual words
35
Using FSAs to Represent the
Lexicon and Do Morphological
Recognition
• Lexicon: We can expand each non-terminal in our
NFSA into each stem in its class (e.g. adj_root2 =
{big, red}) and expand each such stem to the letters it
includes (e.g. red  r e d, big  b i g)
q0
q1
ε
r e
q2
q4
q3
-er, -est
d
b
g
q5
q6
i
q7
36
Limitations
• To cover all of e.g. English will require very large
FSAs with consequent search problems
– Adding new items to the lexicon means recomputing the
FSA
– Non-determinism
• FSAs can only tell us whether a word is in the
language or not – what if we want to know more?
– What is the stem?
– What are the affixes and what sort are they?
– We used this information to build our FSA: can we get it
back?
37
Parsing/Generation
vs. Recognition
• Recognition is usually not quite what we need.
– Usually if we find some string in the language we need to find the
structure in it (parsing)
– Or we have some structure and we want to produce a surface form
(production/generation)
• Example
– From “cats” to “cat +N +PL”
38
Finite State Transducers
• The simple story
– Add another tape
– Add extra symbols to the transitions
– On one tape we read “cats”, on the other we write “cat +N
+PL”
39
Parsing with Finite State
Transducers
• cats cat +N +PL
• Kimmo Koskenniemi’s two-level morphology
– Words represented as correspondences between lexical
level (the morphemes) and surface level (the orthographic
word)
– Morphological parsing :building mappings between the
lexical and surface levels
c a t +N +PL
c a t s
40
Finite State Transducers
• FSTs map between one set of symbols and another
using an FSA whose alphabet  is composed of pairs
of symbols from input and output alphabets
• In general, FSTs can be used for
– Translator (Hello:Ciao)
– Parser/generator (Hello:How may I help you?)
– To map between the lexical and surface levels of Kimmo’s
2-level morphology
41
• FST is a 5-tuple consisting of
– Q: set of states {q0,q1,q2,q3,q4}
– : an alphabet of complex symbols, each an i/o pair s.t. i  I
(an input alphabet) and o  O (an output alphabet) and  is
in I x O
– q0: a start state
– F: a set of final states in Q {q4}
– (q,i:o): a transition function mapping Q x  to Q
– Emphatic Sheep  Quizzical Cow
q0 q4
q1 q2 q3
b:m a:o
a:o
a:o !:?
42
Transitions
• c:c means read a c on one tape and write a c on the other
• +N:ε means read a +N symbol on one tape and write nothing on
the other
• +PL:s means read +PL and write an s
c:c a:a t:t +N:ε +PL:s
43
FST for a 2-level Lexicon
• E.g.
Reg-n Irreg-pl-n Irreg-sg-n
c a t g o:e o:e s e g o o s e
q0 q1 q2 q3
c a t
q1 q3 q4
q2
s
e:o e:o e
q0 q5
g
44
FST for English Nominal
Inflection
q0 q7
+PL:^s#
Combining (cascade or composition) this FSA
with FSAs for each noun type replaces e.g. reg-
n with every regular noun representation in the
lexicon
q1 q4
q2 q5
q3 q6
reg-n
irreg-n-sg
irreg-n-pl
+N:
+PL:#
+SG:#
+SG:#
+N:
+N:
45
The Gory Details
• Of course, its not as easy as
– “cat +N +PL” <-> “cats”
• Or even dealing with the irregulars geese, mice and
oxen
• But there are also a whole host of
spelling/pronunciation changes that go along with
inflectional changes
46
Multi-Tape Machines
• To deal with this we can simply add more tapes and
use the output of one tape machine as the input to
the next
• So to handle irregular spelling changes we’ll add
intermediate tapes with intermediate symbols
47
Multi-Level Tape Machines
• We use one machine to transduce between the lexical and the
intermediate level, and another to handle the spelling changes
to the surface tape
48
Orthographic Rules and FSTs
• Define additional FSTs to implement rules such as
consonant doubling (beg  begging), ‘e’ deletion
(make  making), ‘e’ insertion (watch  watches),
etc.
Lexical
f o x +N +PL
Intermediate
f o x ^ s #
Surface
f o x e s
49
Lexical to Intermediate Level
50
Intermediate to Surface
• The add an “e” rule as in fox^s# <-> foxes
51
Note
• A key feature of this machine is that it doesn’t do
anything to inputs to which it doesn’t apply.
• Meaning that they are written out unchanged to the
output tape.
52
53
54
• Note: These FSTs can be used for generation as well
as recognition by simply exchanging the input and
output alphabets (e.g. ^s#:+PL)
55
Summing Up
• FSTs provide a useful tool for implementing a
standard model of morphological analysis, Kimmo’s
two-level morphology
– Key is to provide an FST for each of multiple levels of
representation and then to combine those FSTs using a
variety of operators (cf AT&T FSM Toolkit)
– Other (older) approaches are still widely used, e.g. the rule-
based Porter Stemmer

Más contenido relacionado

Similar a lect4-morphology.ppt

Lexicology -summary
Lexicology  -summary Lexicology  -summary
Lexicology -summary Dieu Dang
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Mustafa Jarrar
 
語言學概論Morphology 2
語言學概論Morphology 2語言學概論Morphology 2
語言學概論Morphology 2Ja-Jun Liao
 
what_is_language.ppt.pdf
what_is_language.ppt.pdfwhat_is_language.ppt.pdf
what_is_language.ppt.pdfking969492
 
Intro. to Linguistics_9 Morphology
Intro. to Linguistics_9 MorphologyIntro. to Linguistics_9 Morphology
Intro. to Linguistics_9 MorphologyEdi Brata
 
13-Ling-21---Lecture-12b---Language-Thought-and-Culture.ppt
13-Ling-21---Lecture-12b---Language-Thought-and-Culture.ppt13-Ling-21---Lecture-12b---Language-Thought-and-Culture.ppt
13-Ling-21---Lecture-12b---Language-Thought-and-Culture.pptAvneeshKumar164042
 
Stylistics introduction, Definitions of Stylistics
Stylistics introduction, Definitions of StylisticsStylistics introduction, Definitions of Stylistics
Stylistics introduction, Definitions of StylisticsAngel Ortega
 
Gender_grammar_Schnoebelen.ppt
Gender_grammar_Schnoebelen.pptGender_grammar_Schnoebelen.ppt
Gender_grammar_Schnoebelen.pptNINASULISYOWATI
 
1. level of language study.pptx
1. level of language study.pptx1. level of language study.pptx
1. level of language study.pptxAlkadumiHamletto
 
リーディング研究会2014年6月輪読_最終版(関西学院大学大学院・金澤)
リーディング研究会2014年6月輪読_最終版(関西学院大学大学院・金澤)リーディング研究会2014年6月輪読_最終版(関西学院大学大学院・金澤)
リーディング研究会2014年6月輪読_最終版(関西学院大学大学院・金澤)Yu Kanazawa / Osaka University
 

Similar a lect4-morphology.ppt (20)

sept5-07.ppt
sept5-07.pptsept5-07.ppt
sept5-07.ppt
 
Morphology
Morphology Morphology
Morphology
 
Morphemes
MorphemesMorphemes
Morphemes
 
Lexicology -summary
Lexicology  -summary Lexicology  -summary
Lexicology -summary
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 
語言學概論Morphology 2
語言學概論Morphology 2語言學概論Morphology 2
語言學概論Morphology 2
 
what_is_language.ppt.pdf
what_is_language.ppt.pdfwhat_is_language.ppt.pdf
what_is_language.ppt.pdf
 
Part of speech- noun
Part of speech- nounPart of speech- noun
Part of speech- noun
 
Morphology
MorphologyMorphology
Morphology
 
Intro. to Linguistics_9 Morphology
Intro. to Linguistics_9 MorphologyIntro. to Linguistics_9 Morphology
Intro. to Linguistics_9 Morphology
 
13-Ling-21---Lecture-12b---Language-Thought-and-Culture.ppt
13-Ling-21---Lecture-12b---Language-Thought-and-Culture.ppt13-Ling-21---Lecture-12b---Language-Thought-and-Culture.ppt
13-Ling-21---Lecture-12b---Language-Thought-and-Culture.ppt
 
Vocabulary ii slide 1
Vocabulary ii   slide 1Vocabulary ii   slide 1
Vocabulary ii slide 1
 
Linguistics and grammar
Linguistics and grammarLinguistics and grammar
Linguistics and grammar
 
Stylistics introduction, Definitions of Stylistics
Stylistics introduction, Definitions of StylisticsStylistics introduction, Definitions of Stylistics
Stylistics introduction, Definitions of Stylistics
 
Effective business communication presentation.ppt
Effective business communication presentation.pptEffective business communication presentation.ppt
Effective business communication presentation.ppt
 
Gender_grammar_Schnoebelen.ppt
Gender_grammar_Schnoebelen.pptGender_grammar_Schnoebelen.ppt
Gender_grammar_Schnoebelen.ppt
 
1. level of language study.pptx
1. level of language study.pptx1. level of language study.pptx
1. level of language study.pptx
 
リーディング研究会2014年6月輪読_最終版(関西学院大学大学院・金澤)
リーディング研究会2014年6月輪読_最終版(関西学院大学大学院・金澤)リーディング研究会2014年6月輪読_最終版(関西学院大学大学院・金澤)
リーディング研究会2014年6月輪読_最終版(関西学院大学大学院・金澤)
 
Semantics 1
Semantics 1Semantics 1
Semantics 1
 
Functional English 03. Noun, Number, Gender
Functional English 03. Noun, Number, GenderFunctional English 03. Noun, Number, Gender
Functional English 03. Noun, Number, Gender
 

Más de Krismalita

MORPHOLOGY.ppt
MORPHOLOGY.pptMORPHOLOGY.ppt
MORPHOLOGY.pptKrismalita
 
Reported speech.pptx
Reported speech.pptxReported speech.pptx
Reported speech.pptxKrismalita
 
This, That, These, Those.pptx
This, That, These, Those.pptxThis, That, These, Those.pptx
This, That, These, Those.pptxKrismalita
 
Singluar and Plural.pptx
Singluar and Plural.pptxSingluar and Plural.pptx
Singluar and Plural.pptxKrismalita
 
Adjetive Clause.pptx
Adjetive Clause.pptxAdjetive Clause.pptx
Adjetive Clause.pptxKrismalita
 
Narrative Text.pptx
Narrative Text.pptxNarrative Text.pptx
Narrative Text.pptxKrismalita
 
The Ice Hotel.pptx
The Ice Hotel.pptxThe Ice Hotel.pptx
The Ice Hotel.pptxKrismalita
 

Más de Krismalita (7)

MORPHOLOGY.ppt
MORPHOLOGY.pptMORPHOLOGY.ppt
MORPHOLOGY.ppt
 
Reported speech.pptx
Reported speech.pptxReported speech.pptx
Reported speech.pptx
 
This, That, These, Those.pptx
This, That, These, Those.pptxThis, That, These, Those.pptx
This, That, These, Those.pptx
 
Singluar and Plural.pptx
Singluar and Plural.pptxSingluar and Plural.pptx
Singluar and Plural.pptx
 
Adjetive Clause.pptx
Adjetive Clause.pptxAdjetive Clause.pptx
Adjetive Clause.pptx
 
Narrative Text.pptx
Narrative Text.pptxNarrative Text.pptx
Narrative Text.pptx
 
The Ice Hotel.pptx
The Ice Hotel.pptxThe Ice Hotel.pptx
The Ice Hotel.pptx
 

Último

Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...Amil baba
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxPooja Bhuva
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxPooja Bhuva
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 

Último (20)

Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 

lect4-morphology.ppt

  • 2. 2 What is Morphology? • The study of how words are composed of morphemes (the smallest meaning-bearing units of a language) – Stems – core meaning units in a lexicon – Affixes (prefixes, suffixes, infixes, circumfixes) – bits and pieces that combine with stems to modify their meanings and grammatical functions (can have multiple ones) • Immaterial • Trying • Absobl**dylutely • agoing • Unreadable
  • 3. 3 Why is Morphology Important to the Lexicon? Full listing versus Minimal Redundancy • true, truer, truest, truly, untrue, truth, truthful, truthfully, untruthfully, untruthfulness • Untruthfulness = un- + true + -th + -ful + -ness • These morphemes appear to be productive • By representing knowledge about the internal structure of words and the rules of word formation, we can save room and search time.
  • 4. 4 Need to do Morphological Parsing Morphological Parsing (or Stemming) • Taking a surface input and breaking it down into its morphemes • foxes breaks down into the morphemes fox (noun stem) and –es (plural suffix) • rewrites breaks down into re- (prefix) and write (stem) and –s (suffix)
  • 5. 5 Two Broad Classes of Morphology • Inflectional Morphology – Combination of stem and morpheme resulting in word of same class – Usually fills a syntactic feature such as agreement – E.g., plural –s, past tense -ed • Derivational Morphology – Combination of stem and morpheme usually results in a word of a different class – Meaning of the new word may be hard to predict – E.g., +ation in words such as computerization
  • 6. 6 Word Classes • By word class, we have in mind familiar notions like noun and verb that we discussed a bit in the previous lecture. • We’ll go into more details when we get to parsing (Chapter 12). • Right now we’re concerned with word classes because the way that stems and affixes combine is based to a large degree on the word class of the stem.
  • 7. 7 English Inflectional Morphology • Word stem combines with grammatical morpheme – Usually produces word of same class – Usually serves a syntactic function (e.g., agreement) like  likes or liked bird  birds • Nominal morphology – Plural forms • s or es • Irregular forms (next slide) • Mass vs. count nouns (email or emails) – Possessives
  • 8. 8 Complication in Morphology • Ok so it gets a little complicated by the fact that some words misbehave (refuse to follow the rules) • The terms regular and irregular will be used to refer to words that follow the rules and those that don’t. Regular (Nouns) • Singular (cat, thrush) • Plural (cats, thrushes) • Possessive (cat’s thrushes’) Irregular (Nouns) • Singular (mouse, ox) • Plural (mice, oxen)
  • 9. 9 • Verbal inflection – Main verbs (sleep, like, fear) are relatively regular • -s, ing, ed • And productive: Emailed, instant-messaged, faxed, homered • But eat/ate/eaten, catch/caught/caught – Primary (be, have, do) and modal verbs (can, will, must) are often irregular and not productive • Be: am/is/are/were/was/been/being – Irregular verbs few (~250) but frequently occurring – English verbal inflection is much simpler than e.g. Latin
  • 10. 10 Regular and Irregular Verbs • Regulars… – Walk, walks, walking, walked, walked • Irregulars – Eat, eats, eating, ate, eaten – Catch, catches, catching, caught, caught – Cut, cuts, cutting, cut, cut
  • 11. 11 Derivational Morphology • Derivational morphology is the messy stuff that no one ever taught you. – Quasi-systematicity – Irregular meaning change – Changes of word class
  • 12. 12 English Derivational Morphology • Word stem combines with grammatical morpheme – Usually produces a word of a different class – More complicated than inflectional • Example: nominalization – -ize verbs  -ation nouns – generalize, realize  generalization, realization – verb  -er nouns – Murder, spell  murderer, speller • Example: verbs, nouns  adjectives – embrace, pity embraceable, pitiable – care, wit  careless, witless
  • 13. 13 • Example: adjective  adverb – happy  happily • More complicated to model than inflection – Less productive: *science-less, *concern-less, *go-able, *sleep-able – Meanings of derived terms harder to predict by rule • clueless, careless, nerveless
  • 14. 14 Derivational Examples • Verb/Adj to Noun -ation computerize computerization -ee appoint appointee -er kill killer -ness fuzzy fuzziness
  • 15. 15 Derivational Examples • Noun/Verb to Adj -al Computation Computational -able Embrace Embraceable -less Clue Clueless
  • 16. 16 Compute • Many paths are possible… • Start with compute – Computer -> computerize -> computerization – Computation -> computational – Computer -> computerize -> computerizable – Compute -> computee
  • 17. 17 How do people represent words? • Hypotheses: – Full listing hypothesis: words listed – Minimum redundancy hypothesis: morphemes listed • Experimental evidence: – Priming experiments (Does seeing/hearing one word facilitate recognition of another?) suggest neither • Regularly inflected forms prime stem but not derived forms • But spoken derived words can prime stems if they are semantically close (e.g. government/govern but not department/depart)
  • 18. 18 • Speech errors suggest affixes must be represented separately in the mental lexicon – easy enoughly
  • 19. 19 Parsing • Taking a surface input and identifying its components and underlying structure • Morphological parsing: parsing a word into stem and affixes and identifying the parts and their relationships – Stem and features: • goose  goose +N +SG or goose +V • geese  goose +N +PL • gooses  goose +V +3SG – Bracketing: indecipherable  [in [[de [cipher]] able]]
  • 20. 20 Why parse words? • For spell-checking – Is muncheble a legal word? • To identify a word’s part-of-speech (pos) – For sentence parsing, for machine translation, … • To identify a word’s stem – For information retrieval
  • 21. 21 What do we need to build a morphological parser? • Lexicon: stems and affixes (w/ corresponding pos) • Morphotactics of the language: model of how morphemes can be affixed to a stem. E.g., plural morpheme follows noun in English • Orthographic rules: spelling modifications that occur when affixation occurs – in  il in context of l (in- + legal)
  • 22. 22 Morphotactic Models • English nominal inflection q0 q2 q1 plural (-s) reg-n irreg-sg-n irreg-pl-n •Inputs: cats, goose, geese
  • 23. 23 Antworth data on English Adjectives • Big, bigger, biggest • Cool, cooler, coolest, cooly • Red, redder, reddest • Clear, clearer, clearest, clearly, unclear, unclearly • Happy, happier, happiest, happily • Unhappy, unhappier, unhappiest, unhappily • Real, unreal, really
  • 24. 24 Antworth data on English Adjectives • Big, bigger, biggest • Cool, cooler, coolest, cooly • Red, redder, reddest • Clear, clearer, clearest, clearly, unclear, unclearly • Happy, happier, happiest, happily • Unhappy, unhappier, unhappiest, unhappily • Real, unreal, really
  • 25. 25 Antworth data on English Adjectives • Big, bigger, biggest • Cool, cooler, coolest, cooly • Red, redder, reddest • Clear, clearer, clearest, clearly, unclear, unclearly • Happy, happier, happiest, happily • Unhappy, unhappier, unhappiest, unhappily • Real, unreal, really
  • 26. 26 Antworth data on English Adjectives • Big, bigger, biggest • Cool, cooler, coolest, cooly • Red, redder, reddest • Clear, clearer, clearest, clearly, unclear, unclearly • Happy, happier, happiest, happily • Unhappy, unhappier, unhappiest, unhappily • Real, unreal, really
  • 27. 27 Antworth data on English Adjectives • Big, bigger, biggest • Cool, cooler, coolest, cooly • Red, redder, reddest • Clear, clearer, clearest, clearly, unclear, unclearly • Happy, happier, happiest, happily • Unhappy, unhappier, unhappiest, unhappily • Real, unreal, really
  • 28. 28 Antworth data on English Adjectives • Big, bigger, biggest • Cool, cooler, coolest, cooly • Red, redder, reddest • Clear, clearer, clearest, clearly, unclear, unclearly • Happy, happier, happiest, happily • Unhappy, unhappier, unhappiest, unhappily • Real, unreal, really
  • 29. 29 • Derivational morphology: adjective fragment q5 q0 q1 q2 un- adj-root -er, -ly, -est  • Adj-root: clear, happy, real, big, red
  • 30. 30 • Derivational morphology: adjective fragment q5 q0 q1 q2 un- adj-root -er, -ly, -est  • Adj-root: clear, happy, real, big, red •BUT: unbig, redly, realest
  • 31. 31 Antworth data on English Adjectives • Big, bigger, biggest • Cool, cooler, coolest, cooly • Red, redder, reddest • Clear, clearer, clearest, clearly, unclear, unclearly • Happy, happier, happiest, happily • Unhappy, unhappier, unhappiest, unhappily • Real, unreal, really
  • 32. 32 Antworth data on English Adjectives • Big, bigger, biggest • Cool, cooler, coolest, cooly • Red, redder, reddest • Clear, clearer, clearest, clearly, unclear, unclearly • Happy, happier, happiest, happily • Unhappy, unhappier, unhappiest, unhappily • Real, unreal, really
  • 33. 33 • Derivational morphology: adjective fragment q3 q5 q4 q0 q1 q2 un- adj-root1 -er, -ly, -est  adj-root1 adj-root2 -er, -est • Adj-root1: clear, happy, real • Adj-root2: big, red
  • 34. 34 FSAs and the Lexicon • First we’ll capture the morphotactics – The rules governing the ordering of affixes in a language. • Then we’ll add in the actual words
  • 35. 35 Using FSAs to Represent the Lexicon and Do Morphological Recognition • Lexicon: We can expand each non-terminal in our NFSA into each stem in its class (e.g. adj_root2 = {big, red}) and expand each such stem to the letters it includes (e.g. red  r e d, big  b i g) q0 q1 ε r e q2 q4 q3 -er, -est d b g q5 q6 i q7
  • 36. 36 Limitations • To cover all of e.g. English will require very large FSAs with consequent search problems – Adding new items to the lexicon means recomputing the FSA – Non-determinism • FSAs can only tell us whether a word is in the language or not – what if we want to know more? – What is the stem? – What are the affixes and what sort are they? – We used this information to build our FSA: can we get it back?
  • 37. 37 Parsing/Generation vs. Recognition • Recognition is usually not quite what we need. – Usually if we find some string in the language we need to find the structure in it (parsing) – Or we have some structure and we want to produce a surface form (production/generation) • Example – From “cats” to “cat +N +PL”
  • 38. 38 Finite State Transducers • The simple story – Add another tape – Add extra symbols to the transitions – On one tape we read “cats”, on the other we write “cat +N +PL”
  • 39. 39 Parsing with Finite State Transducers • cats cat +N +PL • Kimmo Koskenniemi’s two-level morphology – Words represented as correspondences between lexical level (the morphemes) and surface level (the orthographic word) – Morphological parsing :building mappings between the lexical and surface levels c a t +N +PL c a t s
  • 40. 40 Finite State Transducers • FSTs map between one set of symbols and another using an FSA whose alphabet  is composed of pairs of symbols from input and output alphabets • In general, FSTs can be used for – Translator (Hello:Ciao) – Parser/generator (Hello:How may I help you?) – To map between the lexical and surface levels of Kimmo’s 2-level morphology
  • 41. 41 • FST is a 5-tuple consisting of – Q: set of states {q0,q1,q2,q3,q4} – : an alphabet of complex symbols, each an i/o pair s.t. i  I (an input alphabet) and o  O (an output alphabet) and  is in I x O – q0: a start state – F: a set of final states in Q {q4} – (q,i:o): a transition function mapping Q x  to Q – Emphatic Sheep  Quizzical Cow q0 q4 q1 q2 q3 b:m a:o a:o a:o !:?
  • 42. 42 Transitions • c:c means read a c on one tape and write a c on the other • +N:ε means read a +N symbol on one tape and write nothing on the other • +PL:s means read +PL and write an s c:c a:a t:t +N:ε +PL:s
  • 43. 43 FST for a 2-level Lexicon • E.g. Reg-n Irreg-pl-n Irreg-sg-n c a t g o:e o:e s e g o o s e q0 q1 q2 q3 c a t q1 q3 q4 q2 s e:o e:o e q0 q5 g
  • 44. 44 FST for English Nominal Inflection q0 q7 +PL:^s# Combining (cascade or composition) this FSA with FSAs for each noun type replaces e.g. reg- n with every regular noun representation in the lexicon q1 q4 q2 q5 q3 q6 reg-n irreg-n-sg irreg-n-pl +N: +PL:# +SG:# +SG:# +N: +N:
  • 45. 45 The Gory Details • Of course, its not as easy as – “cat +N +PL” <-> “cats” • Or even dealing with the irregulars geese, mice and oxen • But there are also a whole host of spelling/pronunciation changes that go along with inflectional changes
  • 46. 46 Multi-Tape Machines • To deal with this we can simply add more tapes and use the output of one tape machine as the input to the next • So to handle irregular spelling changes we’ll add intermediate tapes with intermediate symbols
  • 47. 47 Multi-Level Tape Machines • We use one machine to transduce between the lexical and the intermediate level, and another to handle the spelling changes to the surface tape
  • 48. 48 Orthographic Rules and FSTs • Define additional FSTs to implement rules such as consonant doubling (beg  begging), ‘e’ deletion (make  making), ‘e’ insertion (watch  watches), etc. Lexical f o x +N +PL Intermediate f o x ^ s # Surface f o x e s
  • 50. 50 Intermediate to Surface • The add an “e” rule as in fox^s# <-> foxes
  • 51. 51 Note • A key feature of this machine is that it doesn’t do anything to inputs to which it doesn’t apply. • Meaning that they are written out unchanged to the output tape.
  • 52. 52
  • 53. 53
  • 54. 54 • Note: These FSTs can be used for generation as well as recognition by simply exchanging the input and output alphabets (e.g. ^s#:+PL)
  • 55. 55 Summing Up • FSTs provide a useful tool for implementing a standard model of morphological analysis, Kimmo’s two-level morphology – Key is to provide an FST for each of multiple levels of representation and then to combine those FSTs using a variety of operators (cf AT&T FSM Toolkit) – Other (older) approaches are still widely used, e.g. the rule- based Porter Stemmer