SlideShare una empresa de Scribd logo
1 de 34
Descargar para leer sin conexión
Adding morphological information to a connectionist
            Part-Of-Speech tagger

     F. Zamora-Martínez              M.J. Castro-Bleda S. España-Boquera
                                     S. Tortajada-Velert

                      Departamento de Sistemas Informáticos y Computación
                           Universidad Politécnica de Valencia, Spain

                         Escuela Superior de Enseñanzas Técnicas
            Universidad CEU-Cadenal Herrera, Alfara del Patriarca, Valencia, Spain

                                10-12 November 2009, Sevilla

F. Zamora et al (UPV/CEU-UCH)             CAEPIA 2009            10-12 November 2009, Sevilla   1 / 33

1   POS tagging

2   Probalilistic tagging

3   Connectionist tagging

4   The Penn Treebank Corpus

5   The connectionist POS taggers

6   Conclusions

 F. Zamora et al (UPV/CEU-UCH)   CAEPIA 2009   10-12 November 2009, Sevilla   2 / 33

1   POS tagging

2   Probalilistic tagging

3   Connectionist tagging

4   The Penn Treebank Corpus

5   The connectionist POS taggers

6   Conclusions

 F. Zamora et al (UPV/CEU-UCH)   CAEPIA 2009   10-12 November 2009, Sevilla   3 / 33
What is Part-Of-Speech (POS) tagging?

      T = {τ1 , τ2 , . . . , τk }: a set of POS tags
      Ω = {ω1 , ω2 , . . . , ωm }: the vocabulary of the application

The goal of a Part-Of-Speech tagger is to associate each word in a text
with its correct lexical-syntactic category (represented by a tag).

The    grand        jury     commented    on      a    number   of      other       topics
DT       JJ         NN          VBD       IN     DT      NN     IN       JJ          NNS

 F. Zamora et al (UPV/CEU-UCH)           CAEPIA 2009        10-12 November 2009, Sevilla   4 / 33
Ambiguity and applications

Words often have more than one POS tag: lower
      Europe proposed lower rate increases . . . = JJR
      To push the pound even lower . . . = RBR
      . . . should be able to lower long-term . . . = VB


Applications: speech synthesis, speech recognition, information
retrieval, word-sense disambiguation, machine translation, ...

 F. Zamora et al (UPV/CEU-UCH)      CAEPIA 2009        10-12 November 2009, Sevilla   5 / 33
How hard is POS tagging? Measuring ambiguity

                          Peen Treebank (45-tag corpus)
                  Unambiguous (1 tag) 36,678 (84%)
                  Ambiguous (2-7 tags)      7,088 (16%)
                  Details: 2 tags           5,475
                           3 tags           1,313 (lower)
                           4 tags             250
                           5 tags              41
                           6 tags               7
                           7 tags               2 (bet, open)

A simple approach which assigns only the most common tag to each
word performs with 90% accuracy!

 F. Zamora et al (UPV/CEU-UCH)     CAEPIA 2009      10-12 November 2009, Sevilla   6 / 33
Unknown Words

How can one assign a tag to a given word if that word is unknown to
the tagger?

Unknown words are the hardest problem for POS tagging!

 F. Zamora et al (UPV/CEU-UCH)   CAEPIA 2009    10-12 November 2009, Sevilla   7 / 33

1   POS tagging

2   Probalilistic tagging

3   Connectionist tagging

4   The Penn Treebank Corpus

5   The connectionist POS taggers

6   Conclusions

 F. Zamora et al (UPV/CEU-UCH)   CAEPIA 2009   10-12 November 2009, Sevilla   8 / 33
Probabilistic model

We are given a sentence: what is the best sequence of tags which
corresponds to the sequence of words?

Probabilistic view: Consider all possible sequences of tags and out of
this universe of sequences, choose the tag sequence which is most
probable given the observation sequence of words.

                 ˆn = argmax P(t n |w n ) = argmax P(w n |t n )P(t n ).
                 t1             1    1                1 1         1
                                 t1                   n

 F. Zamora et al (UPV/CEU-UCH)         CAEPIA 2009         10-12 November 2009, Sevilla   9 / 33
Probabilistic model: Simplifications

To simplify:
  1   Words are independent of each other and a word’s identity only
      depends on its tag → lexical probabilities:
                                    n n
                                 P(w1 |t1 )   ≈         P(wi |ti )

  2   Another one establishes that the probability of one tag to appear
      only depends on its predecessor tag (bigram, trigram, ...) →
      contextual probabilities:
                                 P(t1 )   ≈         P(ti |ti−1 ).

 F. Zamora et al (UPV/CEU-UCH)        CAEPIA 2009                   10-12 November 2009, Sevilla   10 / 33
Probabilistic model: Limitations

With these assumptions, a typical probabilistic model is expressed as:
             ˆn = argmax P(t n |w n ) ≈ argmax
             t1                                         P(wi |ti )P(ti |ti−1 ),
                            1    1
                         t1                 n
                                           t1     i=1

where ˆ1 is the best estimation of POS tags for the given sentence
 n = w w . . . w and considering that P(t |t ) = 1.
w1     1 2       n                        1 0

  1   It does not model long-distance relationships.
  2   The contextual information takes into account the context on the
      left while the context on the right is not considered.
Both limitations can be overwhelmed using ANNs models.

 F. Zamora et al (UPV/CEU-UCH)      CAEPIA 2009             10-12 November 2009, Sevilla   11 / 33

1   POS tagging

2   Probalilistic tagging

3   Connectionist tagging

4   The Penn Treebank Corpus

5   The connectionist POS taggers

6   Conclusions

 F. Zamora et al (UPV/CEU-UCH)   CAEPIA 2009   10-12 November 2009, Sevilla   12 / 33
Basic connectionist model

                  Europe          proposed         lower       rate     increases
                   NNP              VBD            ?????       NN          NNS

MLPs as POS tags classifiers:
   MLP Input:
             lower — wi : the ambiguous input word, loc. cod. → projection layer
             NNP , VBD, NN, NNS — ci : the tags of the words surrounding the
             ambiguous word to be tagged (past and future context), loc. cod.
      MLP Output:
             the probability of each tag given the input:
             Pr(JJR|input)=0.6, Pr(RBR|input)=0.2, Pr(VB|input)=0.1, . . .
Therefore, the network learnt the following mapping:
                                 F (wi , ci , ti , Θ) = PrΘ (ti |wi , ci )

 F. Zamora et al (UPV/CEU-UCH)                  CAEPIA 2009             10-12 November 2009, Sevilla   13 / 33
Morphological extended connectionist model

      Europe              proposed            lower             rate             increases
     NNP-Cap             VBD-NCap            ?????            NN-NCap            NNS-NCap
                                            NCap, -er

MLPs as POS tags classifiers:
   MLP Input:
             lower — wi : the ambiguous input word, loc. cod. → projection layer
             NCap, -er — mi : morph. info related to the amb. input word.
             NNP-Cap., VBD-NCap, NN-NCap, NNS-NCap — ci : the tags of the
             words surrounding the ambiguous word to be tagged (past and
             future context) extended with morphological information, loc. cod.
      MLP Output:
             the probability of each tag given the input.
Therefore, the network learnt the following mapping:
                          F (wi , mi , ci , ti , Θ) = PrΘ (ti |wi , mi , ci ),

 F. Zamora et al (UPV/CEU-UCH)                CAEPIA 2009             10-12 November 2009, Sevilla   14 / 33
And what about Unknown Words?

When evaluating the model, there are words that have never been
seen during training; therefore, they do not belong neither to the
vocabulary of known ambiguous words nor to the vocabulary of known
non-ambiguous words → “Unknown words”: the hardest problem for
the network to tag correctly.

Proposed solution
A combination of two especialized models:
      MLPKnow : the MLP specialized for known ambiguous words
      MLPUnk : the MLP specialized in unknown words

 F. Zamora et al (UPV/CEU-UCH)   CAEPIA 2009   10-12 November 2009, Sevilla   15 / 33
MLPKnow for known ambiguous words

                                                                           wi : known ambiguous
                                                                           input word locally
                                                                           codified at the input of
                                                                           the projection layer
                                                                           mi : morphological info
                                                                           related to the input
                                                                           ambiguous word
                                                                           Context: two labels of
                                                                           past context and one
                                                                           label of future context,
                                                                           extended with
                                                                           morphological info.

                    FKnow (wi , mi , ci , ti , ΘK ) = PrΘK (ti |wi , mi , ci ).

 F. Zamora et al (UPV/CEU-UCH)              CAEPIA 2009            10-12 November 2009, Sevilla   16 / 33
MLPUnk for unknown words
                                                                             mi : morphological info
                                                                             related to the input
                                                                             unknown word (the
                                                                             same that for MLPKnow
                                                                             si : more specific
                                                                             morphological info
                                                                             related to the input
                                                                             unknown word
                                                                             (different from
                                                                             Context: three labels
                                                                             of past context and
                                                                             one label of future
                                                                             context, extended with
                                                                             morphological info.
                      FUnk (mi , si , ci , ti , ΘU ) = PrΘU (ti |mi , si , ci ),
where si corresponds to additional morphological information related
to the unknown input i-th word.
 F. Zamora et al (UPV/CEU-UCH)               CAEPIA 2009             10-12 November 2009, Sevilla   17 / 33
Twi table with the POS tags

                                 minutes          NNS, NNPS
                                 magnification     NN
                                 strikes          NNS, VBZ
                                 size             VBP, NN
                                 layoff           NN
                                 cohens           NNPS
                                 ...              ...

 Tminutes             =      {NNS, NNPS}        Known ambiguous word
 Tmagnification        =      {NN}               Known non-ambiguous word

 F. Zamora et al (UPV/CEU-UCH)            CAEPIA 2009     10-12 November 2009, Sevilla   18 / 33
Final connectionist model

For each posible known word (ambiguous and non-ambiguous) we
have a Twi table with the POS tags observed in training for word wi :

                                                                        if ti ∈ Twi ,
                                         1                               if Twi = {ti },
F (wi , mi , si , ci , ti , ΘK , ΘU ) =
                                        FKnow (wi , mi , ci , ti , ΘK ) if wi ∈ Ω ∧ ti ∈ Twi ,
                                        F (m , s , c , t , Θ )
                                           Unk  i    i   i i        U    in other case.

Where Ω is the ambiguous words vocabulary.

     ˆn = argmax Pr (t n |w n ) ≈ argmax
     t1                                                  F (wi , mi , si , ci , ti , ΘK , ΘU )
                      1    1
                  t1                         n
                                            t1     i=1

 F. Zamora et al (UPV/CEU-UCH)            CAEPIA 2009              10-12 November 2009, Sevilla   19 / 33

1   POS tagging

2   Probalilistic tagging

3   Connectionist tagging

4   The Penn Treebank Corpus

5   The connectionist POS taggers

6   Conclusions

 F. Zamora et al (UPV/CEU-UCH)   CAEPIA 2009   10-12 November 2009, Sevilla   20 / 33
The Penn Treebank Corpus

      This corpus consists of a set of English texts from the Wall Street
      Journal distributed in 25 directories containing 100 files with
      several sentences each one.
      The total number of words is about one million, being 49 000
      The whole corpus was labeled with POS and synyactic tags.
      The POS tag labeling consists of a set of 45 different categories.
      Two more tag were added to take into account the beginning and
      ending of a sentence, thus resulting in a total amount of 47
      different POS tags.

 F. Zamora et al (UPV/CEU-UCH)   CAEPIA 2009       10-12 November 2009, Sevilla   21 / 33
The Penn Treebank Corpus: Partitions

       Dataset          Directory    Num. of        Num. of       Vocabulary
                                    sentences         words          size
       Training            00-18      38 219         912 344        34 064
       Tuning              19-21       5 527         131 768        12 389
       Test                22-24       5 462         129 654        11 548
       Total               00-24      49 208        1 173 766       38 452

 F. Zamora et al (UPV/CEU-UCH)        CAEPIA 2009         10-12 November 2009, Sevilla   22 / 33
The Penn Treebank Corpus: Preprocess

Huge corpus with a lot of words in ambiguous vocabulary. Preprocess
to reduce the vocabulary:
      Ten random partitions from training set of equal size. Words that
      appeared just in one partition were considered as unknown words.
      POS tags appearing in a word less than 1% of its possible tags
      were eliminated (tagging errors).

 F. Zamora et al (UPV/CEU-UCH)   CAEPIA 2009     10-12 November 2009, Sevilla   23 / 33
The Penn Treebank Corpus: Morph. information

Two morphological preprocessing filters:
      Deleting the prefixes from the composed words (using a set of the
      125 more common English prefixes). In this way, some unknown
      words were converted to known words.
pre-, electro-, tele-, . . .

      All the cardinal and ordinal numbers (except “one” and “second”
      that are polysemic) were replaced with the special token *CD*.
 twenty-years-old                ⇒   *CD*-years-old
 post-1987                       ⇒   post-*CD*

 F. Zamora et al (UPV/CEU-UCH)             CAEPIA 2009   10-12 November 2009, Sevilla   24 / 33
The Penn Treebank Corpus: Morph. information

Morphological added to MLPs:
      Three input units ⇒ input word has the first capital letter, all caps
      or a subset. This is an important morphological characteristic and
      it was also added to the POS tags of the context (both MLPs).
      A unit indicating if the word has any dash “-” (both MLPs).
      A unit indicating if the word has any point “.” (both MLPs).
      Suffix analysis to deal with unknown words (only MLPUnk ):
             Compute the probability distribution of tags for suffixes of length
             less or equal to 10 ⇒ 709 suffixes found.
             An agglomerative hierarchical clustering process was followed, and
             a empirical set of clusters was chosen.
             Finally, a set of the 21 more common grammatical suffixes were
      MLPUnk needs 209 units for take into account the presence of
      suffixes in words.

 F. Zamora et al (UPV/CEU-UCH)       CAEPIA 2009       10-12 November 2009, Sevilla   24 / 33
The Penn Treebank Corpus: after preproces

  Dataset         Num. of words      Unambiguous      Ambiguous             Unknown
  Training               912 344          549 272        361 704               1 368
  Tuning                 131 768           77 347         51 292               3 129
  Test                   129 654           75 758         51 315               2 581
  Total                1 173 766          702 377        464 311               7 078

                                 Vocabulary in Training

      6 239 ambiguous words.
      25 798 unambiguous words were obtained.

 F. Zamora et al (UPV/CEU-UCH)          CAEPIA 2009       10-12 November 2009, Sevilla   25 / 33

1   POS tagging

2   Probalilistic tagging

3   Connectionist tagging

4   The Penn Treebank Corpus

5   The connectionist POS taggers

6   Conclusions

 F. Zamora et al (UPV/CEU-UCH)   CAEPIA 2009   10-12 November 2009, Sevilla   26 / 33
The connectionist POS taggers
      Projection layer.
      Error backpropagation algorithm for training.
      The topology and parameters of multilayer perceptrons in the
      trainings were selected in previous experimentation.
      For the experiments we have used a toolkit for pattern recognition
      tasks developed by our research group.
      MLPKnow trained with ambiguous vocabulary words.
      MLPUnk trained with words that appear less than 4 times.
     Parameter                           MLPKnown                         MLPUnk
     Input layer size            |T + M |(p + f ) + 50 + |M|     |T + M |(p + f ) + |M| + |S|
     Output layer size                      |T |                            |T |
     Projection layer size               |Ω | → 50                           –
     Hidden layer(s) size                 100-75                          175-100
     Hidden layer act. func.                         Hyperbolic Tangent
     Output layer act. func.                               Softmax
     Learning rate                                           0.005
     Momentum                                                0.001
     Weight decay                                         0.0000001

 F. Zamora et al (UPV/CEU-UCH)               CAEPIA 2009           10-12 November 2009, Sevilla   27 / 33
Performance on the tuning set

POS tagging error rate for the tuning set varying the context (p is the
past context, and f is the future context).

                                                             MLPUnk error
                MLPKnown error
                                                    Past       1       2              3
   Past           2    3       4 5
                                                     1       12.56 12.46            12.40
    2           6.30 6.26 6.25 6.31
                                                     2       12.27 12.08            12.37
    3           6.28 6.22 6.20 6.31
                                                     3       12.59 11.95            12.24
    4           6.28 6.27 6.28 6.31
                                                     4       12.72 12.34            12.46

 F. Zamora et al (UPV/CEU-UCH)        CAEPIA 2009          10-12 November 2009, Sevilla   28 / 33
Test POS tagging performance

POS tagging error rate for the tuning and test sets for the global
system. Comparison of our connectionist system with morphological
information versus our previous system without morphological

             Partition           With morp. info.        Without morp. info.
             Tuning                    3.2%                    4.2%
             Test                      3.3%                    4.3%

 F. Zamora et al (UPV/CEU-UCH)             CAEPIA 2009         10-12 November 2009, Sevilla   29 / 33

1   POS tagging

2   Probalilistic tagging

3   Connectionist tagging

4   The Penn Treebank Corpus

5   The connectionist POS taggers

6   Conclusions

 F. Zamora et al (UPV/CEU-UCH)   CAEPIA 2009   10-12 November 2009, Sevilla   30 / 33
Conclusions: Comparison with other tagging systems
POS tagging error rate for the test set. Known refers to the
disambiguation error for known ambiguous words. Unk refers to the
POS tag error for unknown words. Total is the total POS tag error, with
ambiguous, non-ambiguous, and unknown words.

              Model              KnownAmb         Unknown        Total
              SVMs                  6.1             11.0          2.8
              MT                     -              23.5          3.5
              TnT                   7.8             14.1          3.5
              NetTagger              -               -            3.8
              HMM Tagger             -               -            5.8
              RANN                   -               -            8.0
              Our approach          6.7             10.3          3.3

                Results comparable with state of the art systems.

 F. Zamora et al (UPV/CEU-UCH)      CAEPIA 2009        10-12 November 2009, Sevilla   31 / 33
Conclusions: Future works

      Increase the amount of morphological information.
      Test the models in a graph based approach.
             Introduce a language model of POS tags to improve the results.

 F. Zamora et al (UPV/CEU-UCH)       CAEPIA 2009       10-12 November 2009, Sevilla   32 / 33
Thank you!

F. Zamora et al (UPV/CEU-UCH)      CAEPIA 2009   10-12 November 2009, Sevilla   33 / 33

Más contenido relacionado

Similar a Adding morphological information to a connectionist Part-Of-Speech tagger

Surface-related multiple elimination through orthogonal encoding in the laten...
Surface-related multiple elimination through orthogonal encoding in the laten...Surface-related multiple elimination through orthogonal encoding in the laten...
Surface-related multiple elimination through orthogonal encoding in the laten...Oleg Ovcharenko
MEBI 591C/598 – Data and Text Mining in Biomedical Informatics
MEBI 591C/598 – Data and Text Mining in Biomedical InformaticsMEBI 591C/598 – Data and Text Mining in Biomedical Informatics
MEBI 591C/598 – Data and Text Mining in Biomedical Informaticsbutest
Detecting paraphrases using recursive autoencoders
Detecting paraphrases using recursive autoencodersDetecting paraphrases using recursive autoencoders
Detecting paraphrases using recursive autoencodersFeynman Liang
Effective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic AmbiguityEffective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic AmbiguityIDES Editor
Investigation of-combined-use-of-mfcc-and-lpc-features-in-speech-recognition-...
Investigation of-combined-use-of-mfcc-and-lpc-features-in-speech-recognition-...Investigation of-combined-use-of-mfcc-and-lpc-features-in-speech-recognition-...
Investigation of-combined-use-of-mfcc-and-lpc-features-in-speech-recognition-...Cemal Ardil
Topic model an introduction
Topic model an introductionTopic model an introduction
Topic model an introductionYueshen Xu
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015RIILP

Similar a Adding morphological information to a connectionist Part-Of-Speech tagger (9)

Surface-related multiple elimination through orthogonal encoding in the laten...
Surface-related multiple elimination through orthogonal encoding in the laten...Surface-related multiple elimination through orthogonal encoding in the laten...
Surface-related multiple elimination through orthogonal encoding in the laten...
MEBI 591C/598 – Data and Text Mining in Biomedical Informatics
MEBI 591C/598 – Data and Text Mining in Biomedical InformaticsMEBI 591C/598 – Data and Text Mining in Biomedical Informatics
MEBI 591C/598 – Data and Text Mining in Biomedical Informatics
Detecting paraphrases using recursive autoencoders
Detecting paraphrases using recursive autoencodersDetecting paraphrases using recursive autoencoders
Detecting paraphrases using recursive autoencoders
Effective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic AmbiguityEffective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic Ambiguity
Investigation of-combined-use-of-mfcc-and-lpc-features-in-speech-recognition-...
Investigation of-combined-use-of-mfcc-and-lpc-features-in-speech-recognition-...Investigation of-combined-use-of-mfcc-and-lpc-features-in-speech-recognition-...
Investigation of-combined-use-of-mfcc-and-lpc-features-in-speech-recognition-...
Topic model an introduction
Topic model an introductionTopic model an introduction
Topic model an introduction
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015

Más de Francisco Zamora-Martinez

Integration of Unsupervised and Supervised Criteria for DNNs Training
Integration of Unsupervised and Supervised Criteria for DNNs TrainingIntegration of Unsupervised and Supervised Criteria for DNNs Training
Integration of Unsupervised and Supervised Criteria for DNNs TrainingFrancisco Zamora-Martinez
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction ChallengeESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction ChallengeFrancisco Zamora-Martinez
Time-series forecasting of indoor temperature using pre-trained Deep Neural N...
Time-series forecasting of indoor temperature using pre-trained Deep Neural N...Time-series forecasting of indoor temperature using pre-trained Deep Neural N...
Time-series forecasting of indoor temperature using pre-trained Deep Neural N...Francisco Zamora-Martinez
F-Measure as the error function to train Neural Networks
F-Measure as the error function to train Neural NetworksF-Measure as the error function to train Neural Networks
F-Measure as the error function to train Neural NetworksFrancisco Zamora-Martinez
Contributions to connectionist language modeling and its application to seque...
Contributions to connectionist language modeling and its application to seque...Contributions to connectionist language modeling and its application to seque...
Contributions to connectionist language modeling and its application to seque...Francisco Zamora-Martinez
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...Francisco Zamora-Martinez
Behaviour-based Clustering of Neural Networks applied to Document Enhancement
Behaviour-based Clustering of Neural Networks applied to Document EnhancementBehaviour-based Clustering of Neural Networks applied to Document Enhancement
Behaviour-based Clustering of Neural Networks applied to Document EnhancementFrancisco Zamora-Martinez
Efficient Viterbi algorithms for lexical tree based models
Efficient Viterbi algorithms for lexical tree based modelsEfficient Viterbi algorithms for lexical tree based models
Efficient Viterbi algorithms for lexical tree based modelsFrancisco Zamora-Martinez
Efficient BP Algorithms for General Feedforward Neural Networks
Efficient BP Algorithms for General Feedforward Neural NetworksEfficient BP Algorithms for General Feedforward Neural Networks
Efficient BP Algorithms for General Feedforward Neural NetworksFrancisco Zamora-Martinez
Some empirical evaluations of a temperature forecasting module based on Art...
Some empirical evaluations of a temperature forecasting module   based on Art...Some empirical evaluations of a temperature forecasting module   based on Art...
Some empirical evaluations of a temperature forecasting module based on Art...Francisco Zamora-Martinez

Más de Francisco Zamora-Martinez (10)

Integration of Unsupervised and Supervised Criteria for DNNs Training
Integration of Unsupervised and Supervised Criteria for DNNs TrainingIntegration of Unsupervised and Supervised Criteria for DNNs Training
Integration of Unsupervised and Supervised Criteria for DNNs Training
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction ChallengeESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
Time-series forecasting of indoor temperature using pre-trained Deep Neural N...
Time-series forecasting of indoor temperature using pre-trained Deep Neural N...Time-series forecasting of indoor temperature using pre-trained Deep Neural N...
Time-series forecasting of indoor temperature using pre-trained Deep Neural N...
F-Measure as the error function to train Neural Networks
F-Measure as the error function to train Neural NetworksF-Measure as the error function to train Neural Networks
F-Measure as the error function to train Neural Networks
Contributions to connectionist language modeling and its application to seque...
Contributions to connectionist language modeling and its application to seque...Contributions to connectionist language modeling and its application to seque...
Contributions to connectionist language modeling and its application to seque...
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
Behaviour-based Clustering of Neural Networks applied to Document Enhancement
Behaviour-based Clustering of Neural Networks applied to Document EnhancementBehaviour-based Clustering of Neural Networks applied to Document Enhancement
Behaviour-based Clustering of Neural Networks applied to Document Enhancement
Efficient Viterbi algorithms for lexical tree based models
Efficient Viterbi algorithms for lexical tree based modelsEfficient Viterbi algorithms for lexical tree based models
Efficient Viterbi algorithms for lexical tree based models
Efficient BP Algorithms for General Feedforward Neural Networks
Efficient BP Algorithms for General Feedforward Neural NetworksEfficient BP Algorithms for General Feedforward Neural Networks
Efficient BP Algorithms for General Feedforward Neural Networks
Some empirical evaluations of a temperature forecasting module based on Art...
Some empirical evaluations of a temperature forecasting module   based on Art...Some empirical evaluations of a temperature forecasting module   based on Art...
Some empirical evaluations of a temperature forecasting module based on Art...


Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB

Último (20)

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL

Adding morphological information to a connectionist Part-Of-Speech tagger

  • 1. Adding morphological information to a connectionist Part-Of-Speech tagger F. Zamora-Martínez M.J. Castro-Bleda S. España-Boquera S. Tortajada-Velert Departamento de Sistemas Informáticos y Computación Universidad Politécnica de Valencia, Spain Escuela Superior de Enseñanzas Técnicas Universidad CEU-Cadenal Herrera, Alfara del Patriarca, Valencia, Spain 10-12 November 2009, Sevilla F. Zamora et al (UPV/CEU-UCH) CAEPIA 2009 10-12 November 2009, Sevilla 1 / 33
  • 2. Index 1 POS tagging 2 Probalilistic tagging 3 Connectionist tagging 4 The Penn Treebank Corpus 5 The connectionist POS taggers 6 Conclusions F. Zamora et al (UPV/CEU-UCH) CAEPIA 2009 10-12 November 2009, Sevilla 2 / 33
  • 3. Index 1 POS tagging 2 Probalilistic tagging 3 Connectionist tagging 4 The Penn Treebank Corpus 5 The connectionist POS taggers 6 Conclusions F. Zamora et al (UPV/CEU-UCH) CAEPIA 2009 10-12 November 2009, Sevilla 3 / 33
  • 4. What is Part-Of-Speech (POS) tagging? T = {τ1 , τ2 , . . . , τk }: a set of POS tags Ω = {ω1 , ω2 , . . . , ωm }: the vocabulary of the application The goal of a Part-Of-Speech tagger is to associate each word in a text with its correct lexical-syntactic category (represented by a tag). Example The grand jury commented on a number of other topics DT JJ NN VBD IN DT NN IN JJ NNS F. Zamora et al (UPV/CEU-UCH) CAEPIA 2009 10-12 November 2009, Sevilla 4 / 33
  • 5. Ambiguity and applications Words often have more than one POS tag: lower Europe proposed lower rate increases . . . = JJR To push the pound even lower . . . = RBR . . . should be able to lower long-term . . . = VB Ambiguity!!! Applications: speech synthesis, speech recognition, information retrieval, word-sense disambiguation, machine translation, ... F. Zamora et al (UPV/CEU-UCH) CAEPIA 2009 10-12 November 2009, Sevilla 5 / 33
  • 6. How hard is POS tagging? Measuring ambiguity Peen Treebank (45-tag corpus) Unambiguous (1 tag) 36,678 (84%) Ambiguous (2-7 tags) 7,088 (16%) Details: 2 tags 5,475 3 tags 1,313 (lower) 4 tags 250 5 tags 41 6 tags 7 7 tags 2 (bet, open) A simple approach which assigns only the most common tag to each word performs with 90% accuracy! F. Zamora et al (UPV/CEU-UCH) CAEPIA 2009 10-12 November 2009, Sevilla 6 / 33
  • 7. Unknown Words How can one assign a tag to a given word if that word is unknown to the tagger? Unknown words are the hardest problem for POS tagging! F. Zamora et al (UPV/CEU-UCH) CAEPIA 2009 10-12 November 2009, Sevilla 7 / 33
  • 8. Index 1 POS tagging 2 Probalilistic tagging 3 Connectionist tagging 4 The Penn Treebank Corpus 5 The connectionist POS taggers 6 Conclusions F. Zamora et al (UPV/CEU-UCH) CAEPIA 2009 10-12 November 2009, Sevilla 8 / 33
  • 9. Probabilistic model We are given a sentence: what is the best sequence of tags which corresponds to the sequence of words? Probabilistic view: Consider all possible sequences of tags and out of this universe of sequences, choose the tag sequence which is most probable given the observation sequence of words. ˆn = argmax P(t n |w n ) = argmax P(w n |t n )P(t n ). t1 1 1 1 1 1 n t1 n t1 F. Zamora et al (UPV/CEU-UCH) CAEPIA 2009 10-12 November 2009, Sevilla 9 / 33
  • 10. Probabilistic model: Simplifications To simplify: 1 Words are independent of each other and a word’s identity only depends on its tag → lexical probabilities: n n n P(w1 |t1 ) ≈ P(wi |ti ) i=1 2 Another one establishes that the probability of one tag to appear only depends on its predecessor tag (bigram, trigram, ...) → contextual probabilities: n n P(t1 ) ≈ P(ti |ti−1 ). i=1 F. Zamora et al (UPV/CEU-UCH) CAEPIA 2009 10-12 November 2009, Sevilla 10 / 33
  • 11. Probabilistic model: Limitations With these assumptions, a typical probabilistic model is expressed as: n ˆn = argmax P(t n |w n ) ≈ argmax t1 P(wi |ti )P(ti |ti−1 ), 1 1 n t1 n t1 i=1 where ˆ1 is the best estimation of POS tags for the given sentence tn n = w w . . . w and considering that P(t |t ) = 1. w1 1 2 n 1 0 1 It does not model long-distance relationships. 2 The contextual information takes into account the context on the left while the context on the right is not considered. Both limitations can be overwhelmed using ANNs models. F. Zamora et al (UPV/CEU-UCH) CAEPIA 2009 10-12 November 2009, Sevilla 11 / 33
  • 12. Index 1 POS tagging 2 Probalilistic tagging 3 Connectionist tagging 4 The Penn Treebank Corpus 5 The connectionist POS taggers 6 Conclusions F. Zamora et al (UPV/CEU-UCH) CAEPIA 2009 10-12 November 2009, Sevilla 12 / 33
  • 13. Basic connectionist model Europe proposed lower rate increases NNP VBD ????? NN NNS MLPs as POS tags classifiers: MLP Input: lower — wi : the ambiguous input word, loc. cod. → projection layer NNP , VBD, NN, NNS — ci : the tags of the words surrounding the ambiguous word to be tagged (past and future context), loc. cod. MLP Output: the probability of each tag given the input: Pr(JJR|input)=0.6, Pr(RBR|input)=0.2, Pr(VB|input)=0.1, . . . Therefore, the network learnt the following mapping: F (wi , ci , ti , Θ) = PrΘ (ti |wi , ci ) F. Zamora et al (UPV/CEU-UCH) CAEPIA 2009 10-12 November 2009, Sevilla 13 / 33
  • 14. Morphological extended connectionist model Europe proposed lower rate increases NNP-Cap VBD-NCap ????? NN-NCap NNS-NCap NCap, -er MLPs as POS tags classifiers: MLP Input: lower — wi : the ambiguous input word, loc. cod. → projection layer NCap, -er — mi : morph. info related to the amb. input word. NNP-Cap., VBD-NCap, NN-NCap, NNS-NCap — ci : the tags of the words surrounding the ambiguous word to be tagged (past and future context) extended with morphological information, loc. cod. MLP Output: the probability of each tag given the input. Therefore, the network learnt the following mapping: F (wi , mi , ci , ti , Θ) = PrΘ (ti |wi , mi , ci ), F. Zamora et al (UPV/CEU-UCH) CAEPIA 2009 10-12 November 2009, Sevilla 14 / 33
  • 15. And what about Unknown Words? When evaluating the model, there are words that have never been seen during training; therefore, they do not belong neither to the vocabulary of known ambiguous words nor to the vocabulary of known non-ambiguous words → “Unknown words”: the hardest problem for the network to tag correctly. Proposed solution A combination of two especialized models: MLPKnow : the MLP specialized for known ambiguous words MLPUnk : the MLP specialized in unknown words F. Zamora et al (UPV/CEU-UCH) CAEPIA 2009 10-12 November 2009, Sevilla 15 / 33
  • 16. MLPKnow for known ambiguous words wi : known ambiguous input word locally codified at the input of the projection layer mi : morphological info related to the input ambiguous word Context: two labels of past context and one label of future context, extended with morphological info. FKnow (wi , mi , ci , ti , ΘK ) = PrΘK (ti |wi , mi , ci ). F. Zamora et al (UPV/CEU-UCH) CAEPIA 2009 10-12 November 2009, Sevilla 16 / 33
  • 17. MLPUnk for unknown words mi : morphological info related to the input unknown word (the same that for MLPKnow si : more specific morphological info related to the input unknown word (different from MLPKnow Context: three labels of past context and one label of future context, extended with morphological info. FUnk (mi , si , ci , ti , ΘU ) = PrΘU (ti |mi , si , ci ), where si corresponds to additional morphological information related to the unknown input i-th word. F. Zamora et al (UPV/CEU-UCH) CAEPIA 2009 10-12 November 2009, Sevilla 17 / 33
  • 18. Twi table with the POS tags minutes NNS, NNPS magnification NN strikes NNS, VBZ size VBP, NN layoff NN cohens NNPS ... ... Tminutes = {NNS, NNPS} Known ambiguous word Tmagnification = {NN} Known non-ambiguous word F. Zamora et al (UPV/CEU-UCH) CAEPIA 2009 10-12 November 2009, Sevilla 18 / 33
  • 19. Final connectionist model For each posible known word (ambiguous and non-ambiguous) we have a Twi table with the POS tags observed in training for word wi :  0  if ti ∈ Twi ,  1 if Twi = {ti },  F (wi , mi , si , ci , ti , ΘK , ΘU ) = FKnow (wi , mi , ci , ti , ΘK ) if wi ∈ Ω ∧ ti ∈ Twi ,   F (m , s , c , t , Θ ) Unk i i i i U in other case. Where Ω is the ambiguous words vocabulary. n ˆn = argmax Pr (t n |w n ) ≈ argmax t1 F (wi , mi , si , ci , ti , ΘK , ΘU ) 1 1 n t1 n t1 i=1 F. Zamora et al (UPV/CEU-UCH) CAEPIA 2009 10-12 November 2009, Sevilla 19 / 33
  • 20. Index 1 POS tagging 2 Probalilistic tagging 3 Connectionist tagging 4 The Penn Treebank Corpus 5 The connectionist POS taggers 6 Conclusions F. Zamora et al (UPV/CEU-UCH) CAEPIA 2009 10-12 November 2009, Sevilla 20 / 33
  • 21. The Penn Treebank Corpus This corpus consists of a set of English texts from the Wall Street Journal distributed in 25 directories containing 100 files with several sentences each one. The total number of words is about one million, being 49 000 different. The whole corpus was labeled with POS and synyactic tags. The POS tag labeling consists of a set of 45 different categories. Two more tag were added to take into account the beginning and ending of a sentence, thus resulting in a total amount of 47 different POS tags. F. Zamora et al (UPV/CEU-UCH) CAEPIA 2009 10-12 November 2009, Sevilla 21 / 33
  • 22. The Penn Treebank Corpus: Partitions Dataset Directory Num. of Num. of Vocabulary sentences words size Training 00-18 38 219 912 344 34 064 Tuning 19-21 5 527 131 768 12 389 Test 22-24 5 462 129 654 11 548 Total 00-24 49 208 1 173 766 38 452 F. Zamora et al (UPV/CEU-UCH) CAEPIA 2009 10-12 November 2009, Sevilla 22 / 33
  • 23. The Penn Treebank Corpus: Preprocess Huge corpus with a lot of words in ambiguous vocabulary. Preprocess to reduce the vocabulary: Ten random partitions from training set of equal size. Words that appeared just in one partition were considered as unknown words. POS tags appearing in a word less than 1% of its possible tags were eliminated (tagging errors). F. Zamora et al (UPV/CEU-UCH) CAEPIA 2009 10-12 November 2009, Sevilla 23 / 33
  • 24. The Penn Treebank Corpus: Morph. information Two morphological preprocessing filters: Deleting the prefixes from the composed words (using a set of the 125 more common English prefixes). In this way, some unknown words were converted to known words. Example pre-, electro-, tele-, . . . All the cardinal and ordinal numbers (except “one” and “second” that are polysemic) were replaced with the special token *CD*. Example twenty-years-old ⇒ *CD*-years-old post-1987 ⇒ post-*CD* F. Zamora et al (UPV/CEU-UCH) CAEPIA 2009 10-12 November 2009, Sevilla 24 / 33
  • 25. The Penn Treebank Corpus: Morph. information Morphological added to MLPs: Three input units ⇒ input word has the first capital letter, all caps or a subset. This is an important morphological characteristic and it was also added to the POS tags of the context (both MLPs). A unit indicating if the word has any dash “-” (both MLPs). A unit indicating if the word has any point “.” (both MLPs). Suffix analysis to deal with unknown words (only MLPUnk ): Compute the probability distribution of tags for suffixes of length less or equal to 10 ⇒ 709 suffixes found. An agglomerative hierarchical clustering process was followed, and a empirical set of clusters was chosen. Finally, a set of the 21 more common grammatical suffixes were added. MLPUnk needs 209 units for take into account the presence of suffixes in words. F. Zamora et al (UPV/CEU-UCH) CAEPIA 2009 10-12 November 2009, Sevilla 24 / 33
  • 26. The Penn Treebank Corpus: after preproces Dataset Num. of words Unambiguous Ambiguous Unknown Training 912 344 549 272 361 704 1 368 Tuning 131 768 77 347 51 292 3 129 Test 129 654 75 758 51 315 2 581 Total 1 173 766 702 377 464 311 7 078 Vocabulary in Training 6 239 ambiguous words. 25 798 unambiguous words were obtained. F. Zamora et al (UPV/CEU-UCH) CAEPIA 2009 10-12 November 2009, Sevilla 25 / 33
  • 27. Index 1 POS tagging 2 Probalilistic tagging 3 Connectionist tagging 4 The Penn Treebank Corpus 5 The connectionist POS taggers 6 Conclusions F. Zamora et al (UPV/CEU-UCH) CAEPIA 2009 10-12 November 2009, Sevilla 26 / 33
  • 28. The connectionist POS taggers Projection layer. Error backpropagation algorithm for training. The topology and parameters of multilayer perceptrons in the trainings were selected in previous experimentation. For the experiments we have used a toolkit for pattern recognition tasks developed by our research group. MLPKnow trained with ambiguous vocabulary words. MLPUnk trained with words that appear less than 4 times. Parameter MLPKnown MLPUnk Input layer size |T + M |(p + f ) + 50 + |M| |T + M |(p + f ) + |M| + |S| Output layer size |T | |T | Projection layer size |Ω | → 50 – Hidden layer(s) size 100-75 175-100 Hidden layer act. func. Hyperbolic Tangent Output layer act. func. Softmax Learning rate 0.005 Momentum 0.001 Weight decay 0.0000001 F. Zamora et al (UPV/CEU-UCH) CAEPIA 2009 10-12 November 2009, Sevilla 27 / 33
  • 29. Performance on the tuning set POS tagging error rate for the tuning set varying the context (p is the past context, and f is the future context). MLPUnk error MLPKnown error Future Future Past 1 2 3 Past 2 3 4 5 1 12.56 12.46 12.40 2 6.30 6.26 6.25 6.31 2 12.27 12.08 12.37 3 6.28 6.22 6.20 6.31 3 12.59 11.95 12.24 4 6.28 6.27 6.28 6.31 4 12.72 12.34 12.46 F. Zamora et al (UPV/CEU-UCH) CAEPIA 2009 10-12 November 2009, Sevilla 28 / 33
  • 30. Test POS tagging performance POS tagging error rate for the tuning and test sets for the global system. Comparison of our connectionist system with morphological information versus our previous system without morphological information. Partition With morp. info. Without morp. info. Tuning 3.2% 4.2% Test 3.3% 4.3% F. Zamora et al (UPV/CEU-UCH) CAEPIA 2009 10-12 November 2009, Sevilla 29 / 33
  • 31. Index 1 POS tagging 2 Probalilistic tagging 3 Connectionist tagging 4 The Penn Treebank Corpus 5 The connectionist POS taggers 6 Conclusions F. Zamora et al (UPV/CEU-UCH) CAEPIA 2009 10-12 November 2009, Sevilla 30 / 33
  • 32. Conclusions: Comparison with other tagging systems POS tagging error rate for the test set. Known refers to the disambiguation error for known ambiguous words. Unk refers to the POS tag error for unknown words. Total is the total POS tag error, with ambiguous, non-ambiguous, and unknown words. Model KnownAmb Unknown Total SVMs 6.1 11.0 2.8 MT - 23.5 3.5 TnT 7.8 14.1 3.5 NetTagger - - 3.8 HMM Tagger - - 5.8 RANN - - 8.0 Our approach 6.7 10.3 3.3 Results comparable with state of the art systems. F. Zamora et al (UPV/CEU-UCH) CAEPIA 2009 10-12 November 2009, Sevilla 31 / 33
  • 33. Conclusions: Future works Increase the amount of morphological information. Test the models in a graph based approach. Introduce a language model of POS tags to improve the results. F. Zamora et al (UPV/CEU-UCH) CAEPIA 2009 10-12 November 2009, Sevilla 32 / 33
  • 34. Thank you! F. Zamora et al (UPV/CEU-UCH) CAEPIA 2009 10-12 November 2009, Sevilla 33 / 33