SlideShare una empresa de Scribd logo
1 de 29
Descargar para leer sin conexión
Learning Morphological Rules for
           Amharic Verbs
 Using Inductive Logic Programming
                                  1


 Won dwossen M uluge ta                       M icha el Ga sser
Addis Ab a b a U n iversi ty,              In dia n a U n iversit y,
 Addis Ab a b a , Ethio p ia                Bloomin g t on , U S A
 won disho @ya h o o. c o m              ga sser@cs.in d ia n a . ed u



                           LREC Con feren ce
                    S ALTM IL- Af La T Worksh op
         N orma liza t i on of U n der-Reso u rc es La n gua ges
                            Ista n b ul, Turkey
                              May 22, 2012
Agenda
                     2


 Introduction
 Amharic Verb Morphology
 Machine Learning (ILP) of Morphology
 Experiment Setup and Data
 Experiments and Result
 Challenges
 Future work
Introduction
                                         3

 Amharic is a Semitic language (27+ speakers)
     Written using Amharic Fidel, ፊደል , which is syllabic script
             (be=በ, bu=ቡ, bi=ቢ, ba=ባ, bE=ቤ, b=ብ, bo=ቦ)
 Many Computational Approaches of Morphology:
   Rule based and Machine learning for many languages
         HornMorpho: Finite State Transducer (Amharic, Tigrigna and Oromo)
     Amharic morphology has so far been attempted using only rule-
      based methods.
 We have applied machine learning approach to the task
   This is a work on progress
 The work is a contribution to learning of Morphology in
  general
Amharic Verbs
                                     4

 Amharic Verbs convey:
    lexical information, subject and object person, number, and
     gender; tense, aspect, and mood; various derivational
     categories such as passive, causative, and reciprocal; polarity
     (affirmative/negative); relativization; and a range of
     prepositions and conjunctions.
 Amharic Verb Morphology:
    affixation, reduplication, and compounding (common to most)
    The stems consist of a root + vowels + template merger
        (e.g., sbr + ee + CVCVC, which leads to the stem seber ‘broke’)
 It is a non-concatenative process
Amharic Verbs…contd
                                       5

 Amharic verbs: 4 prefixes and 5 suffixes.
 The affixes have an intricate set of co-occurrence rule
 Grammatical features are shown using:
     Affixes, Vowel sequence, Root template (CV pattern)

        ?-sebr-alehu (እሰብራለሁ) 1s pers. sing. simplex imperfective
        ?-seber-alehu (እሰበራለሁ) 1st pers. sing. passive imperfective

        te-deres-ku (ተደረስኩ)  1st pers. sing. passive perfective
        deres-ku (ደረስኩ) 1st pers. sing. simplex perfective

 The Geez script has been Romanized using the standard SERA for
 this experiment using our own Prolog script (lookup dictionary).
Amharic Verbs…contd
         6
Amharic Verbs…contd
         7
Amharic Verbs…contd
                                    8

 Amharic morphology has alternation rules:
     stem affix intersection points or within the stem itself
               Word             Root                   Feature
      gdel (ግደል)              gdl(ግድል)   2nd person sing. masc. imperative

      gdey (gdel-i) (ግደይ)     gdl(ግድል)   2nd person sing. fem. imperative

      t-gedl-aleh (ትገድላለህ)    gdl(ግድል)   2nd person sing. masc. imperfect

      t-gedy-alex (ትገድያለሽ)    gdl(ግድል)   2nd person sing. fem. imperfect




Another Example                          I won’t be
Rule Based Morphology
                                     9

 Finite State for morphology dominant after
 Koskenniemi’s two level morphology .
    Rule based
    HornMorpho developed to analyze Amharic, Tigrigna and
     Oromiffa words
    All rules need to de enumerated
    knowledge-based: (HornMorpho experience)
      difficult to debug,
      hard to modify (to add new findings),
      Difficult to adapt to other similar languages
ILP Framework
                                  10
 ILP combines Logic and
  Programming
 Hypothesis is drawn                            Representation
                                  Input                &               Output
  from background                                  Processing
  knowledge and
  examples.
  The examples (E), background
                                                   Learning
   knowledge (B) and hypothesis                    Algorithm
   (H) are all logic programs.


                                          Examples             Background
 Prolog Rules:                             (+ and -)
                                                   )           (how much?)
ILP for Morphology
                                  11

 ILP learning systems is Supervised
    Supervised morphology learning systems are usually based on
     two-level morphology
    These approaches differ in the level of supervision they employ
     to capture the rules.
      word pairs
      segmentation of input words
      a stem or root and a set of grammatical features
Machine Learning (ILP) of Morphology
                                        12

 Attempts to apply ILP to morphology
   Kazakov, 2000, Manandhar et al, 1998, Zdravkova et al, 2005,
         English, Macedonian
     dealt with languages with relatively simple morphology
     No root template issue (Vital for Amharic and similar languages)
     Was possible to list all (most) examples?


 We have used CLOG ILP tool for our experiment
     CLOG is a Prolog based ILP system,
     Developed by Manandhar et al (1998),
     Learn first order decision lists (rules)
     Use only positive examples.
       CLOG relies on output completeness
Experiment Setup and Data
                                      13

 Learning Amharic morphological rules with ILP:
    Training data prepared with the help of HornMorpho
    Used only 216 Amharic verbs
    background knowledge and the learning aspect.
        1)   To handle stem extraction by identifying affixes,
        2)   To identify root and vowel sequence
        3)   To handle orthographic alternations
        4)   To associate grammatical feature with word constituents
     Training Data Format:
             stem(Word, Stem, Root, Grammatical Features)
     Training Example:
         Stem([s,e,b,e,r,k,u],[s,e,b,e,r],[s,b,r] [1,1]).
         stem([s,e,b,e,r,k],[s,e,b,e,r],[s,b,r], [1,2]).
         stem([s,e,b,e,r,x],[s,e,b,e,r],[s,b,r], [1,3]).
Experiment Setup and Data
                                         14

 Learning stem extraction:
   set_affix predicate
    Identify the prefix and suffixes of the input word.
    Takes the Word and Stem and learns the affixes

           set_affix(Word, Stem, P1,P2,S1,S2):-
             split(Word, P1, W11),
             split(Stem, P2, W22),
             split(W11, X, S1),
             split(W22, X, S2),
             not( (P1=[],P2=[],S1=[],S2=[])).

      The utility predicate ‘split’ segments any input string into all possible
       pairs of substrings.
      sebr {([]-[sebr]), ([s]-[ebr]), ([se]-[br]), ([seb]-[r]), or ([sebr]-[])}.
Experiment Setup and Data
                                 15


 Affix extraction example:
              teseberku (seber) Vs tegedelku (gedel)
                       {break}                {kill}

[],[teseberku],[]                                    [],[tegedelku],[]
[t],[eseberku],[]          [te],[seber],[ku]         [t],[egedelku],[]
[te],[seberku],[]          [te],[gedel],[ku]         [te],[gedelku],[]
        :                                                    :
[te],[seber],[ku]    Segment: [te], STEM, [ku]       [te],[gedel],[ku]
        :            CV Pattern: CVCVC, ee                   :
[te],[seber],[ku]                                    [te],[gedel],[ku]
[te],[seber],[ku]                                    [te],[gedel],[ku]
[],[teseber],[ku]                                    [],[tegedel],[ku]
[],[teseberk],[k]       More general rule that can   [],[tegedelk],[u]
                         be derived for the two
                                 words
Experiment Setup and Data
                                    16

 Learning Roots:
 root_vocal, Predicate                    root_vocal(Stem,Root,Vowel):-
    Used to extract the root from                 merge(Stem,Root,Vowel).
     examples by taking only the Stem
     and the Root (the second and third    merge([X,Y,Z|T],[X,Y|R],[Z|V]):-
     arguments)                                   merge(T,R,V).
    The performs unconstrained            merge([X,Y|T],R,[X,Y|V]):-
     permutation of the characters in             merge(T,R,V).
     the Stem until the first segment of   merge([X|Y],[X|Z],W) :-
     the permutated string matches the            merge(Y,Z,W).
     Root character pattern provided in    merge([X|Y],Z,[X|W]) :-
     the example.                                  merge(Y,Z,W).
Experiment Setup and Data
                                17

    Root Vocal and Template extraction”
             seber (sbr) Vs gedl (gdl)

seber                                      gdel
sebre                                      gdle
  :        Vowel Sequence: ee                     Vowel Sequence: e
           CV Pattern: CVCVC
                                             :    CV Pattern: CVCC
  :                                          :
sbree                                      gdle
  :                                          :
  :                                          :
seebr                                      gedl
ebres                                      edlg
esber                                      egdl
eesbr                                      egld
Experiment Setup and Data
                                          18

 Learning stem internal alternations:
    set_internal_alter predicate
        This predicate works much like the ‘set_affix’ predicate except that it
         replaces a substring which is found in the middle of Stem by another
         substring from Valid_Stem.
                                                          alter([h,e,d],[h,y,e,d]).
        Required a different set of training data:       alter([m,o,t],[m,e,w,o,t]).
                                                          alter([s,a,m],[s,e,?,a,m]).

     set_internal_alter(Stem,Valid_Stem,St1,St2):-
          split(Stem,P1,X1),
          split(Valid_Stem,P1,X2),
          split(X1,St1,Y1),
          split(X2,St2,Y1).
Experiments and Result
                                 19

 To learn a set of rules, the predicate and arity for the rules
  must be provided for CLOG.
   predicate schemas
     rule(stem(_,_,_,_)) for set_affix and root_vocal, and
     rule(alter(_,_)) for set_internal_alter.

 The training set contains 216 Amharic verbs.
 The example contains all possible combinations of tense
  and subject features.
 Each word is first Romanized, then segmented into the
  stem and grammatical features
Experiments and Result
                       20



 Verb     • Stem-Affix Extraction


          • Stem-Internal-Alternation
 Stem


          • Template Extraction
 Stem


          • Grammatical Feature
Feature     Assignment
Experiments and Result
                                         21

 The training took less than a minute for Affix extraction
 108 rules for affix extraction, one example
        stem(Word, Stem, [2, 7]):-
               set_affix(Word, Stem, [y], [], [u], []),
               feature([2, 7], [imperfective, tppn]),
               template(Stem, [1, 0, 1, 1]).

Input Word: [y,m,e,k,r,u] {advise}
Set_affix results in: Stem=[m,e,k,r] as to the above rule (removing[y] and [u])
Template will generate: 1,0,1,1 from Stem which is the same as in the rule
Thus, Feature will declare the word is Imperfective, Third Person Plural Neuter

Input Word: *[y,m,e,k,e,r,u] {advise} (valid but no such examples in the training)
Set_affix will result in: Stem=[m,e,k,e,r] according to the above rule
Template will generate: 1,0,1,0,1 from Stem which will fail the rule
Experiments and Result
                                         22

 18 rules for root template extraction: one example
        root(Stem, Root):-
                root_vocal(Stem, Root, [e, e]),
                 template(Stem, [1, 0, 1, 0, 1]) .

Input Stem: [g,e,r,e,f] {beat}
root_vocal results in:
         Root=[m,k,r] based on the number and type of vowels in the rule
Template will generate: 1,0,1,0,1 from Stem which is the same as in the rule

Input Stem: *[g,a,r,e,f]
root_vocal results in: Root=[g,r,f] [a,e], which fails to meet the rule
Experiments and Result
                                          23

 3 rules for internal stem alternation
         alter(Stem,Valid_Stem):-
           set_internal_alter(Stem,Valid_Stem, [o], [e, w, o]).


Input Stem: [m,o,k] {hot}
Set_internal_alter results in:
         Valid_Stem=[m,e,w,o,k] which will be further analyzed for validity later

Input Stem: [m,o,k,e,r] {try} (wrong alternation)
Set_internal_alter results in:
         Valid_Stem=[m,e,w,o,k,e,r] but it will fail letter in the template extraction
         as [1,0,1,0,1,0,1] is not among the learned templates
Experiments and Result
                                     24

 Test Date:
    verbs in their third person singular masculine form
    Source: Online, Armbruster (1908)
 The verbs are inflected for the eight subjects and four
 tense-aspect-mood features of Amharic
    Total Test set:
        1,784 distinct verb forms
 Accuracy:
    1,552=86.9%
Experiments and Result
                                       25


 Error Analysis
     absence of similar examples
     inappropriate alternation rule

  Test Word        Stem        Root         Feature

[s,e,m,a,c,h,u] [s,e,m,a,?] [s,m,?] perfective, sppn     Correct Analysis


[l,e,g,u,m,u]   [l,e,g,u,m]   NA      NA                No Similar Example


[s,e,m,a,c,h,u] [s,e,y,e,m]   [s,y,m] gerundive, sppn   Wrong Alternation
Challenge
                                    26

The current learning trend demands examples for all
combination of features for word formation.
     Every subject for alltense and voice
     For various root groups/radicals

     For all object/negative/applicative/conj combinations

     For all forms with alternation

 Can we (do we have to) exhaustively list all the
  combination?
 What correlation do we have between the constituents?
 How can we learn these interactions?
........
Future work
                                        27

 We can not give all possible combinations for Amharic
 Won’t be generic otherwise
More Generic Approach: (Genetic Programming)
Generalizing from partial data....some morphemes can decide some
feature of the word independent of the other constituents!
    Rules like..if it has the prefix 'te' then the word is passive despite all
      the other morphemes and template structure

stem(A, B, C, D):-                             S, B, C, Ten and Sub are
     set_affix(A, B, [t,e], [], S, []),        variables and they can take
     root_temp(B, C, [e, e]),                  any forms but the word (if
     template(B, [1, 0, 1, 0, 1]),             it a valid Amharic word)
                                               will be Passive
     feature(D, [Ten, passive, Sub]).   ]).
Acknowledgement
                                  28

 Special Thanks to:
    IDRC-ICT4D project hosted by Nairobi University, Kenya
      For sponsoring my conference participation
      For partially supporting this project
29



    አመሰግናችኋለሁ
a-mesegn-a-chwa-lehu
    I Thank you all
      wondisho@yahoo.com
     wondgewe@indiana.edu
             and
     gasser@cs.indiana.edu

Más contenido relacionado

La actualidad más candente

Status of 3GPP NR Rel.16 NR bands and band combinations
Status of 3GPP NR Rel.16 NR bands and band combinationsStatus of 3GPP NR Rel.16 NR bands and band combinations
Status of 3GPP NR Rel.16 NR bands and band combinationsEiko Seidel
 
Chap 4. call processing and handover.eng
Chap 4. call processing and handover.engChap 4. call processing and handover.eng
Chap 4. call processing and handover.engsivakumar D
 
LTE Reviews - PCI Analysis
LTE Reviews - PCI AnalysisLTE Reviews - PCI Analysis
LTE Reviews - PCI Analysispaulo_campolina
 
RF Module Design - [Chapter 8] Phase-Locked Loops
RF Module Design - [Chapter 8] Phase-Locked LoopsRF Module Design - [Chapter 8] Phase-Locked Loops
RF Module Design - [Chapter 8] Phase-Locked LoopsSimen Li
 
Speaker recognition using MFCC
Speaker recognition using MFCCSpeaker recognition using MFCC
Speaker recognition using MFCCHira Shaukat
 
Genex assistant operation guide (lte)
Genex assistant operation guide (lte)Genex assistant operation guide (lte)
Genex assistant operation guide (lte)Roel Gabon
 
Top 9 sip interview questions answers
Top 9 sip interview questions answersTop 9 sip interview questions answers
Top 9 sip interview questions answersjonhmart036
 
Model-Driven Development for Safety-Critical Software
Model-Driven Development for Safety-Critical SoftwareModel-Driven Development for Safety-Critical Software
Model-Driven Development for Safety-Critical Softwaregjuljo
 
RISC-V Summit 2020: The Next Ten Years
RISC-V Summit 2020: The Next Ten YearsRISC-V Summit 2020: The Next Ten Years
RISC-V Summit 2020: The Next Ten YearsRISC-V International
 
RF Module Design - [Chapter 1] From Basics to RF Transceivers
RF Module Design - [Chapter 1] From Basics to RF TransceiversRF Module Design - [Chapter 1] From Basics to RF Transceivers
RF Module Design - [Chapter 1] From Basics to RF TransceiversSimen Li
 
[ Capella Day 2019 ] Capella integration with Teamcenter
[ Capella Day 2019 ] Capella integration with Teamcenter[ Capella Day 2019 ] Capella integration with Teamcenter
[ Capella Day 2019 ] Capella integration with TeamcenterObeo
 
Agilent ADS 模擬手冊 [實習1] 基本操作與射頻放大器設計
Agilent ADS 模擬手冊 [實習1] 基本操作與射頻放大器設計Agilent ADS 模擬手冊 [實習1] 基本操作與射頻放大器設計
Agilent ADS 模擬手冊 [實習1] 基本操作與射頻放大器設計Simen Li
 
射頻電子實驗手冊 - [實驗8] 低雜訊放大器模擬
射頻電子實驗手冊 - [實驗8] 低雜訊放大器模擬射頻電子實驗手冊 - [實驗8] 低雜訊放大器模擬
射頻電子實驗手冊 - [實驗8] 低雜訊放大器模擬Simen Li
 

La actualidad más candente (20)

Status of 3GPP NR Rel.16 NR bands and band combinations
Status of 3GPP NR Rel.16 NR bands and band combinationsStatus of 3GPP NR Rel.16 NR bands and band combinations
Status of 3GPP NR Rel.16 NR bands and band combinations
 
LTE KPI
LTE KPILTE KPI
LTE KPI
 
Chap 4. call processing and handover.eng
Chap 4. call processing and handover.engChap 4. call processing and handover.eng
Chap 4. call processing and handover.eng
 
LTE Reviews - PCI Analysis
LTE Reviews - PCI AnalysisLTE Reviews - PCI Analysis
LTE Reviews - PCI Analysis
 
RF Module Design - [Chapter 8] Phase-Locked Loops
RF Module Design - [Chapter 8] Phase-Locked LoopsRF Module Design - [Chapter 8] Phase-Locked Loops
RF Module Design - [Chapter 8] Phase-Locked Loops
 
Speaker recognition using MFCC
Speaker recognition using MFCCSpeaker recognition using MFCC
Speaker recognition using MFCC
 
Genex assistant operation guide (lte)
Genex assistant operation guide (lte)Genex assistant operation guide (lte)
Genex assistant operation guide (lte)
 
IMS Signaling Details
IMS Signaling DetailsIMS Signaling Details
IMS Signaling Details
 
Top 9 sip interview questions answers
Top 9 sip interview questions answersTop 9 sip interview questions answers
Top 9 sip interview questions answers
 
EVERYTHING IN LTE
EVERYTHING IN LTEEVERYTHING IN LTE
EVERYTHING IN LTE
 
Lte kpis
Lte kpisLte kpis
Lte kpis
 
Model-Driven Development for Safety-Critical Software
Model-Driven Development for Safety-Critical SoftwareModel-Driven Development for Safety-Critical Software
Model-Driven Development for Safety-Critical Software
 
RISC-V Summit 2020: The Next Ten Years
RISC-V Summit 2020: The Next Ten YearsRISC-V Summit 2020: The Next Ten Years
RISC-V Summit 2020: The Next Ten Years
 
Advancements in Neural Vocoders
Advancements in Neural VocodersAdvancements in Neural Vocoders
Advancements in Neural Vocoders
 
RF Module Design - [Chapter 1] From Basics to RF Transceivers
RF Module Design - [Chapter 1] From Basics to RF TransceiversRF Module Design - [Chapter 1] From Basics to RF Transceivers
RF Module Design - [Chapter 1] From Basics to RF Transceivers
 
[ Capella Day 2019 ] Capella integration with Teamcenter
[ Capella Day 2019 ] Capella integration with Teamcenter[ Capella Day 2019 ] Capella integration with Teamcenter
[ Capella Day 2019 ] Capella integration with Teamcenter
 
Telecom KPIs definitions
Telecom KPIs definitionsTelecom KPIs definitions
Telecom KPIs definitions
 
Dsp lab manual
Dsp lab manualDsp lab manual
Dsp lab manual
 
Agilent ADS 模擬手冊 [實習1] 基本操作與射頻放大器設計
Agilent ADS 模擬手冊 [實習1] 基本操作與射頻放大器設計Agilent ADS 模擬手冊 [實習1] 基本操作與射頻放大器設計
Agilent ADS 模擬手冊 [實習1] 基本操作與射頻放大器設計
 
射頻電子實驗手冊 - [實驗8] 低雜訊放大器模擬
射頻電子實驗手冊 - [實驗8] 低雜訊放大器模擬射頻電子實驗手冊 - [實驗8] 低雜訊放大器模擬
射頻電子實驗手冊 - [實驗8] 低雜訊放大器模擬
 

Similar a Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming

MORPHOLOGICAL SEGMENTATION WITH LSTM NEURAL NETWORKS FOR TIGRINYA
MORPHOLOGICAL SEGMENTATION WITH LSTM NEURAL NETWORKS FOR TIGRINYAMORPHOLOGICAL SEGMENTATION WITH LSTM NEURAL NETWORKS FOR TIGRINYA
MORPHOLOGICAL SEGMENTATION WITH LSTM NEURAL NETWORKS FOR TIGRINYAijnlc
 
Hps a hierarchical persian stemming method
Hps a hierarchical persian stemming methodHps a hierarchical persian stemming method
Hps a hierarchical persian stemming methodijnlc
 
Learning phoneme mappings for transliteration without parallel data
Learning phoneme mappings for transliteration without parallel dataLearning phoneme mappings for transliteration without parallel data
Learning phoneme mappings for transliteration without parallel dataAttaporn Ninsuwan
 
english-grammar
english-grammarenglish-grammar
english-grammarcjsmann
 
Lecture Notes-Are Natural Languages Regular.pdf
Lecture Notes-Are Natural Languages Regular.pdfLecture Notes-Are Natural Languages Regular.pdf
Lecture Notes-Are Natural Languages Regular.pdfDeptii Chaudhari
 
Modeling Improved Syllabification Algorithm for Amharic
Modeling Improved Syllabification Algorithm for AmharicModeling Improved Syllabification Algorithm for Amharic
Modeling Improved Syllabification Algorithm for AmharicGuy De Pauw
 
Stemming algorithms
Stemming algorithmsStemming algorithms
Stemming algorithmsRaghu nath
 
INFO-2950-Languages-and-Grammars.ppt
INFO-2950-Languages-and-Grammars.pptINFO-2950-Languages-and-Grammars.ppt
INFO-2950-Languages-and-Grammars.pptLamhotNaibaho3
 
Moore_slides.ppt
Moore_slides.pptMoore_slides.ppt
Moore_slides.pptbutest
 
DEVELOPING A SIMPLIFIED MORPHOLOGICAL ANALYZER FOR ARABIC PRONOMINAL SYSTEM
DEVELOPING A SIMPLIFIED MORPHOLOGICAL ANALYZER FOR ARABIC PRONOMINAL SYSTEMDEVELOPING A SIMPLIFIED MORPHOLOGICAL ANALYZER FOR ARABIC PRONOMINAL SYSTEM
DEVELOPING A SIMPLIFIED MORPHOLOGICAL ANALYZER FOR ARABIC PRONOMINAL SYSTEMkevig
 
Lecture Notes-Finite State Automata for NLP.pdf
Lecture Notes-Finite State Automata for NLP.pdfLecture Notes-Finite State Automata for NLP.pdf
Lecture Notes-Finite State Automata for NLP.pdfDeptii Chaudhari
 
Contemporary Models of Natural Language Processing
Contemporary Models of Natural Language ProcessingContemporary Models of Natural Language Processing
Contemporary Models of Natural Language ProcessingKaterina Vylomova
 
Prolog Programming Language
Prolog Programming  LanguageProlog Programming  Language
Prolog Programming LanguageReham AlBlehid
 
Analysis of lexico syntactic patterns for antonym pair extraction from a turk...
Analysis of lexico syntactic patterns for antonym pair extraction from a turk...Analysis of lexico syntactic patterns for antonym pair extraction from a turk...
Analysis of lexico syntactic patterns for antonym pair extraction from a turk...csandit
 
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURK...
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURK...ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURK...
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURK...cscpconf
 

Similar a Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming (20)

MORPHOLOGICAL SEGMENTATION WITH LSTM NEURAL NETWORKS FOR TIGRINYA
MORPHOLOGICAL SEGMENTATION WITH LSTM NEURAL NETWORKS FOR TIGRINYAMORPHOLOGICAL SEGMENTATION WITH LSTM NEURAL NETWORKS FOR TIGRINYA
MORPHOLOGICAL SEGMENTATION WITH LSTM NEURAL NETWORKS FOR TIGRINYA
 
UWB semeval2016-task5
UWB semeval2016-task5UWB semeval2016-task5
UWB semeval2016-task5
 
Inteligencia artificial
Inteligencia artificialInteligencia artificial
Inteligencia artificial
 
Hps a hierarchical persian stemming method
Hps a hierarchical persian stemming methodHps a hierarchical persian stemming method
Hps a hierarchical persian stemming method
 
Learning phoneme mappings for transliteration without parallel data
Learning phoneme mappings for transliteration without parallel dataLearning phoneme mappings for transliteration without parallel data
Learning phoneme mappings for transliteration without parallel data
 
english-grammar
english-grammarenglish-grammar
english-grammar
 
Nlp
NlpNlp
Nlp
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Lecture Notes-Are Natural Languages Regular.pdf
Lecture Notes-Are Natural Languages Regular.pdfLecture Notes-Are Natural Languages Regular.pdf
Lecture Notes-Are Natural Languages Regular.pdf
 
Modeling Improved Syllabification Algorithm for Amharic
Modeling Improved Syllabification Algorithm for AmharicModeling Improved Syllabification Algorithm for Amharic
Modeling Improved Syllabification Algorithm for Amharic
 
Stemming algorithms
Stemming algorithmsStemming algorithms
Stemming algorithms
 
INFO-2950-Languages-and-Grammars.ppt
INFO-2950-Languages-and-Grammars.pptINFO-2950-Languages-and-Grammars.ppt
INFO-2950-Languages-and-Grammars.ppt
 
Moore_slides.ppt
Moore_slides.pptMoore_slides.ppt
Moore_slides.ppt
 
nlp (1).pptx
nlp (1).pptxnlp (1).pptx
nlp (1).pptx
 
DEVELOPING A SIMPLIFIED MORPHOLOGICAL ANALYZER FOR ARABIC PRONOMINAL SYSTEM
DEVELOPING A SIMPLIFIED MORPHOLOGICAL ANALYZER FOR ARABIC PRONOMINAL SYSTEMDEVELOPING A SIMPLIFIED MORPHOLOGICAL ANALYZER FOR ARABIC PRONOMINAL SYSTEM
DEVELOPING A SIMPLIFIED MORPHOLOGICAL ANALYZER FOR ARABIC PRONOMINAL SYSTEM
 
Lecture Notes-Finite State Automata for NLP.pdf
Lecture Notes-Finite State Automata for NLP.pdfLecture Notes-Finite State Automata for NLP.pdf
Lecture Notes-Finite State Automata for NLP.pdf
 
Contemporary Models of Natural Language Processing
Contemporary Models of Natural Language ProcessingContemporary Models of Natural Language Processing
Contemporary Models of Natural Language Processing
 
Prolog Programming Language
Prolog Programming  LanguageProlog Programming  Language
Prolog Programming Language
 
Analysis of lexico syntactic patterns for antonym pair extraction from a turk...
Analysis of lexico syntactic patterns for antonym pair extraction from a turk...Analysis of lexico syntactic patterns for antonym pair extraction from a turk...
Analysis of lexico syntactic patterns for antonym pair extraction from a turk...
 
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURK...
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURK...ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURK...
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURK...
 

Más de Guy De Pauw

Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...Guy De Pauw
 
Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...Guy De Pauw
 
Resource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech TaggingResource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech TaggingGuy De Pauw
 
Natural Language Processing for Amazigh Language
Natural Language Processing for Amazigh LanguageNatural Language Processing for Amazigh Language
Natural Language Processing for Amazigh LanguageGuy De Pauw
 
POS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik LanguagePOS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik LanguageGuy De Pauw
 
The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)Guy De Pauw
 
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...Guy De Pauw
 
Tagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News CorpusTagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News CorpusGuy De Pauw
 
A Corpus of Santome
A Corpus of SantomeA Corpus of Santome
A Corpus of SantomeGuy De Pauw
 
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...Guy De Pauw
 
Compiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFSTCompiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFSTGuy De Pauw
 
The Database of Modern Icelandic Inflection
The Database of Modern Icelandic InflectionThe Database of Modern Icelandic Inflection
The Database of Modern Icelandic InflectionGuy De Pauw
 
Issues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken IrishIssues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken IrishGuy De Pauw
 
How to build language technology resources for the next 100 years
How to build language technology resources for the next 100 yearsHow to build language technology resources for the next 100 years
How to build language technology resources for the next 100 yearsGuy De Pauw
 
Towards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersTowards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersGuy De Pauw
 
The PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource DevelopmentThe PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource DevelopmentGuy De Pauw
 
A System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá CharactersA System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá CharactersGuy De Pauw
 
IFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation SystemIFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation SystemGuy De Pauw
 
A Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription SystemA Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription SystemGuy De Pauw
 
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...Bilingual Data Mining for the English-Amharic Statistical Machine Translation...
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...Guy De Pauw
 

Más de Guy De Pauw (20)

Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...
 
Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...
 
Resource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech TaggingResource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech Tagging
 
Natural Language Processing for Amazigh Language
Natural Language Processing for Amazigh LanguageNatural Language Processing for Amazigh Language
Natural Language Processing for Amazigh Language
 
POS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik LanguagePOS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik Language
 
The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)
 
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
 
Tagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News CorpusTagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News Corpus
 
A Corpus of Santome
A Corpus of SantomeA Corpus of Santome
A Corpus of Santome
 
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
 
Compiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFSTCompiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFST
 
The Database of Modern Icelandic Inflection
The Database of Modern Icelandic InflectionThe Database of Modern Icelandic Inflection
The Database of Modern Icelandic Inflection
 
Issues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken IrishIssues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken Irish
 
How to build language technology resources for the next 100 years
How to build language technology resources for the next 100 yearsHow to build language technology resources for the next 100 years
How to build language technology resources for the next 100 years
 
Towards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersTowards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound Analysers
 
The PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource DevelopmentThe PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource Development
 
A System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá CharactersA System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá Characters
 
IFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation SystemIFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation System
 
A Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription SystemA Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription System
 
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...Bilingual Data Mining for the English-Amharic Statistical Machine Translation...
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...
 

Último

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming

  • 1. Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming 1 Won dwossen M uluge ta M icha el Ga sser Addis Ab a b a U n iversi ty, In dia n a U n iversit y, Addis Ab a b a , Ethio p ia Bloomin g t on , U S A won disho @ya h o o. c o m ga sser@cs.in d ia n a . ed u LREC Con feren ce S ALTM IL- Af La T Worksh op N orma liza t i on of U n der-Reso u rc es La n gua ges Ista n b ul, Turkey May 22, 2012
  • 2. Agenda 2  Introduction  Amharic Verb Morphology  Machine Learning (ILP) of Morphology  Experiment Setup and Data  Experiments and Result  Challenges  Future work
  • 3. Introduction 3  Amharic is a Semitic language (27+ speakers)  Written using Amharic Fidel, ፊደል , which is syllabic script  (be=በ, bu=ቡ, bi=ቢ, ba=ባ, bE=ቤ, b=ብ, bo=ቦ)  Many Computational Approaches of Morphology:  Rule based and Machine learning for many languages  HornMorpho: Finite State Transducer (Amharic, Tigrigna and Oromo)  Amharic morphology has so far been attempted using only rule- based methods.  We have applied machine learning approach to the task  This is a work on progress  The work is a contribution to learning of Morphology in general
  • 4. Amharic Verbs 4  Amharic Verbs convey:  lexical information, subject and object person, number, and gender; tense, aspect, and mood; various derivational categories such as passive, causative, and reciprocal; polarity (affirmative/negative); relativization; and a range of prepositions and conjunctions.  Amharic Verb Morphology:  affixation, reduplication, and compounding (common to most)  The stems consist of a root + vowels + template merger  (e.g., sbr + ee + CVCVC, which leads to the stem seber ‘broke’)  It is a non-concatenative process
  • 5. Amharic Verbs…contd 5  Amharic verbs: 4 prefixes and 5 suffixes.  The affixes have an intricate set of co-occurrence rule  Grammatical features are shown using:  Affixes, Vowel sequence, Root template (CV pattern) ?-sebr-alehu (እሰብራለሁ) 1s pers. sing. simplex imperfective ?-seber-alehu (እሰበራለሁ) 1st pers. sing. passive imperfective te-deres-ku (ተደረስኩ)  1st pers. sing. passive perfective deres-ku (ደረስኩ) 1st pers. sing. simplex perfective The Geez script has been Romanized using the standard SERA for this experiment using our own Prolog script (lookup dictionary).
  • 8. Amharic Verbs…contd 8  Amharic morphology has alternation rules:  stem affix intersection points or within the stem itself Word Root Feature gdel (ግደል) gdl(ግድል) 2nd person sing. masc. imperative gdey (gdel-i) (ግደይ) gdl(ግድል) 2nd person sing. fem. imperative t-gedl-aleh (ትገድላለህ) gdl(ግድል) 2nd person sing. masc. imperfect t-gedy-alex (ትገድያለሽ) gdl(ግድል) 2nd person sing. fem. imperfect Another Example I won’t be
  • 9. Rule Based Morphology 9  Finite State for morphology dominant after Koskenniemi’s two level morphology .  Rule based  HornMorpho developed to analyze Amharic, Tigrigna and Oromiffa words  All rules need to de enumerated  knowledge-based: (HornMorpho experience)  difficult to debug,  hard to modify (to add new findings),  Difficult to adapt to other similar languages
  • 10. ILP Framework 10  ILP combines Logic and Programming  Hypothesis is drawn Representation Input & Output from background Processing knowledge and examples.  The examples (E), background Learning knowledge (B) and hypothesis Algorithm (H) are all logic programs. Examples Background Prolog Rules: (+ and -) ) (how much?)
  • 11. ILP for Morphology 11  ILP learning systems is Supervised  Supervised morphology learning systems are usually based on two-level morphology  These approaches differ in the level of supervision they employ to capture the rules.  word pairs  segmentation of input words  a stem or root and a set of grammatical features
  • 12. Machine Learning (ILP) of Morphology 12  Attempts to apply ILP to morphology  Kazakov, 2000, Manandhar et al, 1998, Zdravkova et al, 2005,  English, Macedonian  dealt with languages with relatively simple morphology  No root template issue (Vital for Amharic and similar languages)  Was possible to list all (most) examples?  We have used CLOG ILP tool for our experiment  CLOG is a Prolog based ILP system,  Developed by Manandhar et al (1998),  Learn first order decision lists (rules)  Use only positive examples.  CLOG relies on output completeness
  • 13. Experiment Setup and Data 13  Learning Amharic morphological rules with ILP:  Training data prepared with the help of HornMorpho  Used only 216 Amharic verbs  background knowledge and the learning aspect. 1) To handle stem extraction by identifying affixes, 2) To identify root and vowel sequence 3) To handle orthographic alternations 4) To associate grammatical feature with word constituents Training Data Format: stem(Word, Stem, Root, Grammatical Features) Training Example: Stem([s,e,b,e,r,k,u],[s,e,b,e,r],[s,b,r] [1,1]). stem([s,e,b,e,r,k],[s,e,b,e,r],[s,b,r], [1,2]). stem([s,e,b,e,r,x],[s,e,b,e,r],[s,b,r], [1,3]).
  • 14. Experiment Setup and Data 14  Learning stem extraction:  set_affix predicate  Identify the prefix and suffixes of the input word.  Takes the Word and Stem and learns the affixes set_affix(Word, Stem, P1,P2,S1,S2):- split(Word, P1, W11), split(Stem, P2, W22), split(W11, X, S1), split(W22, X, S2), not( (P1=[],P2=[],S1=[],S2=[])).  The utility predicate ‘split’ segments any input string into all possible pairs of substrings.  sebr {([]-[sebr]), ([s]-[ebr]), ([se]-[br]), ([seb]-[r]), or ([sebr]-[])}.
  • 15. Experiment Setup and Data 15  Affix extraction example: teseberku (seber) Vs tegedelku (gedel) {break} {kill} [],[teseberku],[] [],[tegedelku],[] [t],[eseberku],[] [te],[seber],[ku] [t],[egedelku],[] [te],[seberku],[] [te],[gedel],[ku] [te],[gedelku],[] : : [te],[seber],[ku] Segment: [te], STEM, [ku] [te],[gedel],[ku] : CV Pattern: CVCVC, ee : [te],[seber],[ku] [te],[gedel],[ku] [te],[seber],[ku] [te],[gedel],[ku] [],[teseber],[ku] [],[tegedel],[ku] [],[teseberk],[k] More general rule that can [],[tegedelk],[u] be derived for the two words
  • 16. Experiment Setup and Data 16  Learning Roots:  root_vocal, Predicate root_vocal(Stem,Root,Vowel):-  Used to extract the root from merge(Stem,Root,Vowel). examples by taking only the Stem and the Root (the second and third merge([X,Y,Z|T],[X,Y|R],[Z|V]):- arguments) merge(T,R,V).  The performs unconstrained merge([X,Y|T],R,[X,Y|V]):- permutation of the characters in merge(T,R,V). the Stem until the first segment of merge([X|Y],[X|Z],W) :- the permutated string matches the merge(Y,Z,W). Root character pattern provided in merge([X|Y],Z,[X|W]) :- the example. merge(Y,Z,W).
  • 17. Experiment Setup and Data 17  Root Vocal and Template extraction” seber (sbr) Vs gedl (gdl) seber gdel sebre gdle : Vowel Sequence: ee Vowel Sequence: e CV Pattern: CVCVC : CV Pattern: CVCC : : sbree gdle : : : : seebr gedl ebres edlg esber egdl eesbr egld
  • 18. Experiment Setup and Data 18  Learning stem internal alternations:  set_internal_alter predicate  This predicate works much like the ‘set_affix’ predicate except that it replaces a substring which is found in the middle of Stem by another substring from Valid_Stem. alter([h,e,d],[h,y,e,d]).  Required a different set of training data: alter([m,o,t],[m,e,w,o,t]). alter([s,a,m],[s,e,?,a,m]). set_internal_alter(Stem,Valid_Stem,St1,St2):- split(Stem,P1,X1), split(Valid_Stem,P1,X2), split(X1,St1,Y1), split(X2,St2,Y1).
  • 19. Experiments and Result 19  To learn a set of rules, the predicate and arity for the rules must be provided for CLOG.  predicate schemas  rule(stem(_,_,_,_)) for set_affix and root_vocal, and  rule(alter(_,_)) for set_internal_alter.  The training set contains 216 Amharic verbs.  The example contains all possible combinations of tense and subject features.  Each word is first Romanized, then segmented into the stem and grammatical features
  • 20. Experiments and Result 20 Verb • Stem-Affix Extraction • Stem-Internal-Alternation Stem • Template Extraction Stem • Grammatical Feature Feature Assignment
  • 21. Experiments and Result 21  The training took less than a minute for Affix extraction  108 rules for affix extraction, one example stem(Word, Stem, [2, 7]):- set_affix(Word, Stem, [y], [], [u], []), feature([2, 7], [imperfective, tppn]), template(Stem, [1, 0, 1, 1]). Input Word: [y,m,e,k,r,u] {advise} Set_affix results in: Stem=[m,e,k,r] as to the above rule (removing[y] and [u]) Template will generate: 1,0,1,1 from Stem which is the same as in the rule Thus, Feature will declare the word is Imperfective, Third Person Plural Neuter Input Word: *[y,m,e,k,e,r,u] {advise} (valid but no such examples in the training) Set_affix will result in: Stem=[m,e,k,e,r] according to the above rule Template will generate: 1,0,1,0,1 from Stem which will fail the rule
  • 22. Experiments and Result 22  18 rules for root template extraction: one example root(Stem, Root):- root_vocal(Stem, Root, [e, e]), template(Stem, [1, 0, 1, 0, 1]) . Input Stem: [g,e,r,e,f] {beat} root_vocal results in: Root=[m,k,r] based on the number and type of vowels in the rule Template will generate: 1,0,1,0,1 from Stem which is the same as in the rule Input Stem: *[g,a,r,e,f] root_vocal results in: Root=[g,r,f] [a,e], which fails to meet the rule
  • 23. Experiments and Result 23  3 rules for internal stem alternation alter(Stem,Valid_Stem):- set_internal_alter(Stem,Valid_Stem, [o], [e, w, o]). Input Stem: [m,o,k] {hot} Set_internal_alter results in: Valid_Stem=[m,e,w,o,k] which will be further analyzed for validity later Input Stem: [m,o,k,e,r] {try} (wrong alternation) Set_internal_alter results in: Valid_Stem=[m,e,w,o,k,e,r] but it will fail letter in the template extraction as [1,0,1,0,1,0,1] is not among the learned templates
  • 24. Experiments and Result 24  Test Date:  verbs in their third person singular masculine form  Source: Online, Armbruster (1908)  The verbs are inflected for the eight subjects and four tense-aspect-mood features of Amharic  Total Test set:  1,784 distinct verb forms  Accuracy:  1,552=86.9%
  • 25. Experiments and Result 25  Error Analysis  absence of similar examples  inappropriate alternation rule Test Word Stem Root Feature [s,e,m,a,c,h,u] [s,e,m,a,?] [s,m,?] perfective, sppn Correct Analysis [l,e,g,u,m,u] [l,e,g,u,m] NA NA No Similar Example [s,e,m,a,c,h,u] [s,e,y,e,m] [s,y,m] gerundive, sppn Wrong Alternation
  • 26. Challenge 26 The current learning trend demands examples for all combination of features for word formation.  Every subject for alltense and voice  For various root groups/radicals  For all object/negative/applicative/conj combinations  For all forms with alternation  Can we (do we have to) exhaustively list all the combination?  What correlation do we have between the constituents?  How can we learn these interactions? ........
  • 27. Future work 27  We can not give all possible combinations for Amharic  Won’t be generic otherwise More Generic Approach: (Genetic Programming) Generalizing from partial data....some morphemes can decide some feature of the word independent of the other constituents!  Rules like..if it has the prefix 'te' then the word is passive despite all the other morphemes and template structure stem(A, B, C, D):- S, B, C, Ten and Sub are set_affix(A, B, [t,e], [], S, []), variables and they can take root_temp(B, C, [e, e]), any forms but the word (if template(B, [1, 0, 1, 0, 1]), it a valid Amharic word) will be Passive feature(D, [Ten, passive, Sub]). ]).
  • 28. Acknowledgement 28  Special Thanks to:  IDRC-ICT4D project hosted by Nairobi University, Kenya  For sponsoring my conference participation  For partially supporting this project
  • 29. 29 አመሰግናችኋለሁ a-mesegn-a-chwa-lehu I Thank you all wondisho@yahoo.com wondgewe@indiana.edu and gasser@cs.indiana.edu