SlideShare una empresa de Scribd logo
1 de 21
Descargar para leer sin conexión
The Database of
   Modern Icelandic Inflection
             Kristín Bjarnadóttir
                                

The Árni Magnússon Institute for Icelandic Studies
                                                 
              Reykjavík, Iceland 

                    SaLTMIL-AfLat  
                Istanbul May 22th 2012
The Database of Modern Icelandic Inflection (DMII)
     A database storing full inflectional forms
                                             

Topic:

• The aim in creating the DMII 
• A short description of the DMII 
                                    u
• Why is the DMII not a productive system of rules?

       The necessary information for a rule system was not available 

       A rule system would overgenerate wildly
• The problem of a large tag set: Data scarcity
• The two aspects of the DMII: LT and online


          ‘Modern’ in the title is used in the sense ‘current’.
The Database of Modern Icelandic Inflection
       A database storing full inflectional forms
                                               

               Used in Language Technology.    
           Accessible on-line for the general public.
                                                    


      Headwords/paradigms:
                                 u    
     
       271,400

      Inflectional forms (with tag):
 
           5,800,000

      Unique word forms:
 
          
           2,800,000

      PoS tag set:
 
       
        
     
           700 +

      Inflectional classes:
 
        
     
           630 +

25% of the inflectional classes for the major word classes describe
the idiosyncratic inflection of single words.
The aim in creating the DMII
                                           

               Showing Icelandic inflectional ‘as is’
                                                   

• All inflectional forms, including variants
• No overgeneration
                                u
• No overlap between inflectional classes, i.e., a word can only
  belong to 1 inflectional class (including variants)

• An inflectional class is a bundle of inflectional rules for a set
of words.

• Production: A simple matrix of inflectional rules, with flags for
values for inflectional categories: +/-singular, +/-plural, etc.
The Database of Modern Icelandic Inflection

                         Icelandic inflection is rich.

          Inflectional categories for the major word classes
                                                          
                                        u
                  and number of inflectional forms:

Nouns: 
 case (4), number (2), definiteness (2)
          
        
             = 16
Adjectives: gender (3), case (4), number (2), definiteness (2), degree (3)       = 120
Verbs:
 
          
         
        
         
        
        
             = 106 
    Finite:
 voice (2), mood (2), tense (2), person (3), number (2)
            = 48    
    Nonfinite: 
    
         
        
         
        
        
             = 58

            Imperative: voice (2), number (2) (+ 1 truncated form)
            = 5

            Infinitive: voice (2)
    
         
        
        
             = 2

            Past participle: gender (3), case (4), number (2), definiteness (2) = 48

            Present participle: 
    
         
        
        
              = 1
  
          Supine: voice (2)
       
         
        
        
              = 2
DMII: Output for LT projects
          16 inflectional forms of the noun köttur ‘cat’
                                                      

lemma id             infl.form tag               English tag
köttur;416784;kk;alm;köttur;NFET                 nom.sg.indef.
köttur;416784;kk;alm;kötturinn;NFETgr            nom.sg.def.
köttur;416784;kk;alm;kött;ÞFET                   acc.sg.indef.

köttur;416784;kk;alm;ketti;ÞGFET
                                u
köttur;416784;kk;alm;köttinn;ÞFETgr              acc.sg.def.
                                                 dat.sg.indef.
köttur;416784;kk;alm;kettinum;ÞGFETgr            dat.sg.def.
köttur;416784;kk;alm;kattar;EFET                 gen.sg.indef.
köttur;416784;kk;alm;kattarins;EFETgr            gen.sg.def.
köttur;416784;kk;alm;kettir;NFFT                 nom.pl.indef.
köttur;416784;kk;alm;kettirnir;NFFTgr            nom.pl.def.
köttur;416784;kk;alm;ketti;ÞFFT                  acc.pl.indef.
köttur;416784;kk;alm;kettina;ÞFFTgr              acc.pl.def.
köttur;416784;kk;alm;köttum;ÞGFFT                dat.pl.indef.
köttur;416784;kk;alm;köttunum;ÞGFFTgr            dat.pl.def.
köttur;416784;kk;alm;katta;EFFT                  gen.pl.indef.
köttur;416784;kk;alm;kattanna;EFFTgr             gen.pl.def.
Creating the DMII
                                 

         Inflectional class for the noun köttur ‘cat’
                                                   
                 kk_sb_X.ur.-ar-ir.-i-inum
    Set of stems: X1=kött, X2=kött, X3=kett, X4=katt 



   Nom.sg.
     X1 +ur

                           u
                          Matrix:
                                
                                Nom.sg.def.
   X1 +urinn

   Acc.sg.

    X1 +
 
        Acc.sg.def.
   X1 +inn

   Dat.sg.

    X3 +i
 
       Dat.sg.def.
   X3 +inum

   Gen.sg.
     X4 +ar

       Gen.sg.def.
   X4 +arins

   Nom.pl.
     X3 +ir
 
      Nom.pl.def.
   X3 +irnir

   Acc.pl.

    X3 +i
 
       Acc.pl.def.
   X3 +ina

   Dat.pl.

    X2 +um
        Dat.pl.def.
   X2 +unum

   Gen.pl.

    X4 +a
 
       Gen.pl.def.
   X4 +anna
Creating the DMII
                                  

        Inflectional class for the noun köttur ‘cat’
                                                  
                kk_sb_X.ur.-ar-ir.-i-inum

    Set of stems: X1=kött, X2=kött, X3=kett, X4=katt
                                                   
                          u 
                       Paradigm:

   Nom.sg.
     kött+ur

      Nom.sg.def.
   kött+urinn

   Acc.sg.

    kött+
 
       Acc.sg.def.
   kött+inn

   Dat.sg.

    kett+i
 
      Dat.sg.def.
   kett+inum

   Gen.sg.
     katt+ar

      Gen.sg.def.
   katt+arins

   Nom.pl.
     kett+ir

      Nom.pl.def.
   kett+irnir

   Acc.pl.

    kett+i
 
      Acc.pl.def.
   kett+ina

   Dat.pl.

    kött+um
       Dat.pl.def.
   kött+unum

   Gen.pl.

    katt+a
 
      Gen.pl.def.
   katt+anna
The DMII: A multitude of variants
                                            

Inflectional classes:   Over 630, due to inflectional variants

Examples of variants: 
The genitive singular ending –ar/–s in masculine nouns:
                                u
þröskuldur ‘threshold’: gen. þröskuldar/þröskulds 

The dative singular ending –i/–0 in masculine and neuter nouns:
bátur ‘boat’ (masc.): 
 
     báti/bát
fennel ‘fennel’ (neut.): 
    fennel/fenneli
arsenik ‘arsenic’ (neut.): 
  arseniki/arsenik
ópíum ‘opium’ (neut.): 
      ópíum/*ópíumi 

Name (3 variants pr. case): 
Berglind (fem.): acc.+dat.: Berglind/Berglindi/Berglindu
Why not a productive system of rules?
                                Insufficient data
Lexicographic sources: 
• Extensive vocabulary. 
• Partial information on inflection, i.e., principle parts only:

         köttur (nom.sg.indef.), kattar (gen.sg.indef.), kettir (nom.pl.indef.)
                                        u
•  The rest of the paradigm is not (necessarily) predictable.

Grammatical surveys: 
• Full paradigms of selected examples.
• A set of generalized inflectional classes, showing exceptions at times (often in
  notes).
• The emphasis is on the core Icelandic vocabulary, to the exclusion of loanwords,
   slang and informal language. 
• Long history of research (1st survey 1651)
• Strong literary tradition; strong public interest, strong tendency to purism.

Generalization, as in the claim: “The dative ending –i is universal in neuter
nouns”.          

         
       
         
A gap in the inflectional descriptions
                                             
          Dative –i/-0 in multisyllabic neuter nouns



    
      Hvönn er svolítið lík fennel.

    
      ‘Angelica is a little bit like fennel’

    
      . . . ásamt brytjuðu fenneli.


     
     
                                  u
            ‘. . . with diced fennel’
            . . . byrlað eitur, drepinn með arsenik.

    
      ‘. . . poisoned with arsenic’

    
      Hann var myrtur með arseniki.

    
      ‘He was murdered with arsenic’


    
      cf.

    
      Hann var myrtur með ópíum/*ópíumi.

    
      ‘He was murdered with opium’

          Why not a productive system of rules: (2)
    It would overgenerate. The data has to be collected first.
Research for the DMII
                                          
         The search for possible variant inflectional forms
                                                         

•  ll available sources are used:
 A
• Archives, digitized text collections, corpora, Google (at a pinch)
• Native speaker intuition (linguists and others, including users on-
  line)
                            u
• The search is sometimes inconclusive:

   Yggdrasill (masc.) ‘the great tree’ from Norse mythology: 
   o Dative Yggdrasil or Yggdrasli?
   o The regular dative would be Yggdrasli.
   o  No examples of the dative can be found in the Old Icelandic
       sources; modern examples are dubious. 
   o The dative of the neuter noun drasl ‘rubbish’ is drasli.
   o  When referring to a shop in today’s Reykjavík named Yggdrasill speakers
      seem to avoid the dative. (It makes people laugh.)
   o Examples of Yggdrasli appear on the web, mostly in disputes on the
      dative ...
Research for the DMII
                                        
    Ambiguous inflectional forms make searching time-consuming

Inflectional forms in DMII 
 
    
           5,881,374
Unambiguous 

        
     
    
           1,850,090 31.5%
Ambiguous within 1 lemma 
 
     
           3,619,482 61.5%
Ambiguous between lemmas 

      u
                                 
Ambiguous within and between lemmas
                                                 63,641 1.1%
                                                348,161 5.9%


       Inflectional form: Word form with inflectional tag.

Word forms (unique charater strings): 
      2.8 million
Unambiguous (unique inflectional forms):      1.8
Ambiguous:
 
       
       
        
       1.0


       Searching in unannotated text can be extremely slow.
Using the MÍM Corpus (The Tagged Icelandic Corpus)
                                                 
    cf. Sigrún Helgadóttir et. al., Poster session here at SaLTMIL


• 25 million running words from 21st Century texts.
• Compatible tags with the DMII. 
                                  u
• For the reason of ambiguity mentioned before, the first opportunity
  of systematic research on the inflectional system for the use in the
  DMII.

                First stage of comparing the MÍM and the DMII:
                                                             

        
   
        Tokens in MÍM              16,245,429
        
   
        Unique tagged forms in MÍM    737,856
        
   
        In DMII                       425,238
        
   
        Not in DMII                   312,618
Analysis of the word forms from the MÍM Corpus
                   not found in the DMII
                                       

      312,618 tagged word forms (or strings) (out of 737,856)
                                                            

True Icelandic inflectional forms
    
      60%
Miscellaneous strings

       
 u    
      40%

      Foreign words

        
      25%

      Errors
 
      
       
        6%

      Abbreviations & acronyms
 1.7%

      Computer strings (Urls, etc.)
 0.7%

All word forms showing features of adjustment to the Icelandic
inflectional system are counted as Icelandic inflectional forms.


      Approx. additions to the DMII: 125,000 paradigms

      + uncounted “new” variants.
Data scarcity in a rich morphology

The MÍM Corpus:

      **
    
      
       623,000 inflectional forms.
The present DMII:

      270,000 paradigms
 5,800,000 inflectional forms
DMII with addions form MÍM:

                                u 
      
       395,000* paradigms
 8,300,000* inflectional forms


      
      **Figure for lemmas in MÍM not available yet.

      
      * Estimated figures

• This comes as no surprise to Icelandic lexicographers; we have
  always had to cope with the problem of scarcity.
• A description of Icelandic inflection based solely on the MÍM
  corpus would be very meager.
Conclusion

• The DMII was initially made for two purposes:

      As an LT resource (including lexicography)

      For reference for the general public

                                u
• In spite of a long history of research, the available data was
  insufficient. The DMII has therefore become increasingly important
                                                                  
  in research on Icelandic morphology.

• Both corpora and lexicographic data is needed for the DMII.

• As the scope of the DMII expands, the production of a rule system
  becompes more feasible, if one is needed. 


      
      
       With a PS on availability ...   
Availability

       The DMII is available for LT projects, free of charge.
            Download: http://bin.arnastofnun.is/gogn/   
             Online version: http://bin.arnastofnun.is

                                  u
The website is as yet only in Icelandic. Send me an email for an
English version of the conditions on the use of data from the DMII
or any questions: kristinb@hi.is 

The website is being restructured. The new website will contain
extensive comments, explanations of grammatical features, etc. 
Work on an English version of this part of the site is in progress.

The metalanguage of the paradigms themselves will be Icelandic.

     
      Be in touch if you need information.
Thank you for your attention

            Kristín B
         kristinb@hi.is

            SaLTMIL-AfLat  
        Istanbul May 22th 2012

Más contenido relacionado

Similar a The Database of Modern Icelandic Inflection

Automatic Annotation Of Bibliographical References With Target
Automatic Annotation Of Bibliographical References With TargetAutomatic Annotation Of Bibliographical References With Target
Automatic Annotation Of Bibliographical References With TargetSean Flores
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingYasir Khan
 
Tricks in natural language processing
Tricks in natural language processingTricks in natural language processing
Tricks in natural language processingBabu Priyavrat
 
Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014
Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014
Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014Dag Endresen
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Mustafa Jarrar
 
Getting Started with the Hymenoptera Anatomical Ontology
Getting Started with the Hymenoptera Anatomical OntologyGetting Started with the Hymenoptera Anatomical Ontology
Getting Started with the Hymenoptera Anatomical OntologyKatja C. Seltmann
 
Lecture 6: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 6: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 6: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 6: Data-Intensive Computing for Text Analysis (Fall 2011)Matthew Lease
 
Horst Goes Pop - Wieviel Musikempfehlung braucht der Mensch
Horst Goes Pop - Wieviel Musikempfehlung braucht der MenschHorst Goes Pop - Wieviel Musikempfehlung braucht der Mensch
Horst Goes Pop - Wieviel Musikempfehlung braucht der MenschStephan Baumann
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Bhaskar Mitra
 
Semantics of and for the diversity of life:
 Opportunities and perils of tryi...
Semantics of and for the diversity of life:
 Opportunities and perils of tryi...Semantics of and for the diversity of life:
 Opportunities and perils of tryi...
Semantics of and for the diversity of life:
 Opportunities and perils of tryi...Hilmar Lapp
 

Similar a The Database of Modern Icelandic Inflection (16)

Meghyn slides-hse-2014
Meghyn slides-hse-2014Meghyn slides-hse-2014
Meghyn slides-hse-2014
 
OpenNLP demo
OpenNLP demoOpenNLP demo
OpenNLP demo
 
Automatic Annotation Of Bibliographical References With Target
Automatic Annotation Of Bibliographical References With TargetAutomatic Annotation Of Bibliographical References With Target
Automatic Annotation Of Bibliographical References With Target
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Tricks in natural language processing
Tricks in natural language processingTricks in natural language processing
Tricks in natural language processing
 
Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014
Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014
Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014
 
Some Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBASome Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBA
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 
Getting Started with the Hymenoptera Anatomical Ontology
Getting Started with the Hymenoptera Anatomical OntologyGetting Started with the Hymenoptera Anatomical Ontology
Getting Started with the Hymenoptera Anatomical Ontology
 
haenelt.ppt
haenelt.ppthaenelt.ppt
haenelt.ppt
 
Lecture 6: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 6: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 6: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 6: Data-Intensive Computing for Text Analysis (Fall 2011)
 
NLP new words
NLP new wordsNLP new words
NLP new words
 
sadf
sadfsadf
sadf
 
Horst Goes Pop - Wieviel Musikempfehlung braucht der Mensch
Horst Goes Pop - Wieviel Musikempfehlung braucht der MenschHorst Goes Pop - Wieviel Musikempfehlung braucht der Mensch
Horst Goes Pop - Wieviel Musikempfehlung braucht der Mensch
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
Semantics of and for the diversity of life:
 Opportunities and perils of tryi...
Semantics of and for the diversity of life:
 Opportunities and perils of tryi...Semantics of and for the diversity of life:
 Opportunities and perils of tryi...
Semantics of and for the diversity of life:
 Opportunities and perils of tryi...
 

Más de Guy De Pauw

Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...Guy De Pauw
 
Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...Guy De Pauw
 
Resource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech TaggingResource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech TaggingGuy De Pauw
 
POS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik LanguagePOS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik LanguageGuy De Pauw
 
The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)Guy De Pauw
 
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...Guy De Pauw
 
Tagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News CorpusTagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News CorpusGuy De Pauw
 
A Corpus of Santome
A Corpus of SantomeA Corpus of Santome
A Corpus of SantomeGuy De Pauw
 
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...Guy De Pauw
 
Compiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFSTCompiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFSTGuy De Pauw
 
Towards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersTowards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersGuy De Pauw
 
The PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource DevelopmentThe PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource DevelopmentGuy De Pauw
 
A System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá CharactersA System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá CharactersGuy De Pauw
 
IFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation SystemIFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation SystemGuy De Pauw
 
A Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription SystemA Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription SystemGuy De Pauw
 
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...Bilingual Data Mining for the English-Amharic Statistical Machine Translation...
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...Guy De Pauw
 
Towards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersTowards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersGuy De Pauw
 
Human Language Technologies for Ethiopian Languages: Challenges and Future Di...
Human Language Technologies for Ethiopian Languages: Challenges and Future Di...Human Language Technologies for Ethiopian Languages: Challenges and Future Di...
Human Language Technologies for Ethiopian Languages: Challenges and Future Di...Guy De Pauw
 
Amharic document clustering
Amharic document clusteringAmharic document clustering
Amharic document clusteringGuy De Pauw
 
Modeling Improved Syllabification Algorithm for Amharic
Modeling Improved Syllabification Algorithm for AmharicModeling Improved Syllabification Algorithm for Amharic
Modeling Improved Syllabification Algorithm for AmharicGuy De Pauw
 

Más de Guy De Pauw (20)

Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...
 
Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...
 
Resource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech TaggingResource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech Tagging
 
POS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik LanguagePOS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik Language
 
The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)
 
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
 
Tagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News CorpusTagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News Corpus
 
A Corpus of Santome
A Corpus of SantomeA Corpus of Santome
A Corpus of Santome
 
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
 
Compiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFSTCompiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFST
 
Towards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersTowards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound Analysers
 
The PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource DevelopmentThe PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource Development
 
A System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá CharactersA System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá Characters
 
IFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation SystemIFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation System
 
A Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription SystemA Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription System
 
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...Bilingual Data Mining for the English-Amharic Statistical Machine Translation...
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...
 
Towards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersTowards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound Analysers
 
Human Language Technologies for Ethiopian Languages: Challenges and Future Di...
Human Language Technologies for Ethiopian Languages: Challenges and Future Di...Human Language Technologies for Ethiopian Languages: Challenges and Future Di...
Human Language Technologies for Ethiopian Languages: Challenges and Future Di...
 
Amharic document clustering
Amharic document clusteringAmharic document clustering
Amharic document clustering
 
Modeling Improved Syllabification Algorithm for Amharic
Modeling Improved Syllabification Algorithm for AmharicModeling Improved Syllabification Algorithm for Amharic
Modeling Improved Syllabification Algorithm for Amharic
 

Último

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Último (20)

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

The Database of Modern Icelandic Inflection

  • 1. The Database of Modern Icelandic Inflection Kristín Bjarnadóttir The Árni Magnússon Institute for Icelandic Studies Reykjavík, Iceland SaLTMIL-AfLat Istanbul May 22th 2012
  • 2. The Database of Modern Icelandic Inflection (DMII) A database storing full inflectional forms Topic: • The aim in creating the DMII • A short description of the DMII u • Why is the DMII not a productive system of rules? The necessary information for a rule system was not available A rule system would overgenerate wildly • The problem of a large tag set: Data scarcity • The two aspects of the DMII: LT and online ‘Modern’ in the title is used in the sense ‘current’.
  • 3. The Database of Modern Icelandic Inflection A database storing full inflectional forms Used in Language Technology. Accessible on-line for the general public. Headwords/paradigms: u 271,400 Inflectional forms (with tag): 5,800,000 Unique word forms: 2,800,000 PoS tag set: 700 + Inflectional classes: 630 + 25% of the inflectional classes for the major word classes describe the idiosyncratic inflection of single words.
  • 4. The aim in creating the DMII Showing Icelandic inflectional ‘as is’ • All inflectional forms, including variants • No overgeneration u • No overlap between inflectional classes, i.e., a word can only belong to 1 inflectional class (including variants) • An inflectional class is a bundle of inflectional rules for a set of words. • Production: A simple matrix of inflectional rules, with flags for values for inflectional categories: +/-singular, +/-plural, etc.
  • 5. The Database of Modern Icelandic Inflection Icelandic inflection is rich. Inflectional categories for the major word classes u and number of inflectional forms: Nouns: case (4), number (2), definiteness (2) = 16 Adjectives: gender (3), case (4), number (2), definiteness (2), degree (3) = 120 Verbs: = 106 Finite: voice (2), mood (2), tense (2), person (3), number (2) = 48 Nonfinite: = 58 Imperative: voice (2), number (2) (+ 1 truncated form) = 5 Infinitive: voice (2) = 2 Past participle: gender (3), case (4), number (2), definiteness (2) = 48 Present participle: = 1 Supine: voice (2) = 2
  • 6.
  • 7. DMII: Output for LT projects 16 inflectional forms of the noun köttur ‘cat’ lemma id infl.form tag English tag köttur;416784;kk;alm;köttur;NFET nom.sg.indef. köttur;416784;kk;alm;kötturinn;NFETgr nom.sg.def. köttur;416784;kk;alm;kött;ÞFET acc.sg.indef. köttur;416784;kk;alm;ketti;ÞGFET u köttur;416784;kk;alm;köttinn;ÞFETgr acc.sg.def. dat.sg.indef. köttur;416784;kk;alm;kettinum;ÞGFETgr dat.sg.def. köttur;416784;kk;alm;kattar;EFET gen.sg.indef. köttur;416784;kk;alm;kattarins;EFETgr gen.sg.def. köttur;416784;kk;alm;kettir;NFFT nom.pl.indef. köttur;416784;kk;alm;kettirnir;NFFTgr nom.pl.def. köttur;416784;kk;alm;ketti;ÞFFT acc.pl.indef. köttur;416784;kk;alm;kettina;ÞFFTgr acc.pl.def. köttur;416784;kk;alm;köttum;ÞGFFT dat.pl.indef. köttur;416784;kk;alm;köttunum;ÞGFFTgr dat.pl.def. köttur;416784;kk;alm;katta;EFFT gen.pl.indef. köttur;416784;kk;alm;kattanna;EFFTgr gen.pl.def.
  • 8. Creating the DMII Inflectional class for the noun köttur ‘cat’ kk_sb_X.ur.-ar-ir.-i-inum Set of stems: X1=kött, X2=kött, X3=kett, X4=katt Nom.sg. X1 +ur u Matrix: Nom.sg.def. X1 +urinn Acc.sg. X1 + Acc.sg.def. X1 +inn Dat.sg. X3 +i Dat.sg.def. X3 +inum Gen.sg. X4 +ar Gen.sg.def. X4 +arins Nom.pl. X3 +ir Nom.pl.def. X3 +irnir Acc.pl. X3 +i Acc.pl.def. X3 +ina Dat.pl. X2 +um Dat.pl.def. X2 +unum Gen.pl. X4 +a Gen.pl.def. X4 +anna
  • 9. Creating the DMII Inflectional class for the noun köttur ‘cat’ kk_sb_X.ur.-ar-ir.-i-inum Set of stems: X1=kött, X2=kött, X3=kett, X4=katt u Paradigm: Nom.sg. kött+ur Nom.sg.def. kött+urinn Acc.sg. kött+ Acc.sg.def. kött+inn Dat.sg. kett+i Dat.sg.def. kett+inum Gen.sg. katt+ar Gen.sg.def. katt+arins Nom.pl. kett+ir Nom.pl.def. kett+irnir Acc.pl. kett+i Acc.pl.def. kett+ina Dat.pl. kött+um Dat.pl.def. kött+unum Gen.pl. katt+a Gen.pl.def. katt+anna
  • 10. The DMII: A multitude of variants Inflectional classes: Over 630, due to inflectional variants Examples of variants: The genitive singular ending –ar/–s in masculine nouns: u þröskuldur ‘threshold’: gen. þröskuldar/þröskulds The dative singular ending –i/–0 in masculine and neuter nouns: bátur ‘boat’ (masc.): báti/bát fennel ‘fennel’ (neut.): fennel/fenneli arsenik ‘arsenic’ (neut.): arseniki/arsenik ópíum ‘opium’ (neut.): ópíum/*ópíumi Name (3 variants pr. case): Berglind (fem.): acc.+dat.: Berglind/Berglindi/Berglindu
  • 11. Why not a productive system of rules? Insufficient data Lexicographic sources: • Extensive vocabulary. • Partial information on inflection, i.e., principle parts only: köttur (nom.sg.indef.), kattar (gen.sg.indef.), kettir (nom.pl.indef.) u •  The rest of the paradigm is not (necessarily) predictable. Grammatical surveys: • Full paradigms of selected examples. • A set of generalized inflectional classes, showing exceptions at times (often in notes). • The emphasis is on the core Icelandic vocabulary, to the exclusion of loanwords, slang and informal language. • Long history of research (1st survey 1651) • Strong literary tradition; strong public interest, strong tendency to purism. Generalization, as in the claim: “The dative ending –i is universal in neuter nouns”. 
  • 12. A gap in the inflectional descriptions Dative –i/-0 in multisyllabic neuter nouns Hvönn er svolítið lík fennel. ‘Angelica is a little bit like fennel’ . . . ásamt brytjuðu fenneli. u ‘. . . with diced fennel’ . . . byrlað eitur, drepinn með arsenik. ‘. . . poisoned with arsenic’ Hann var myrtur með arseniki. ‘He was murdered with arsenic’ cf. Hann var myrtur með ópíum/*ópíumi. ‘He was murdered with opium’ Why not a productive system of rules: (2) It would overgenerate. The data has to be collected first.
  • 13. Research for the DMII The search for possible variant inflectional forms •  ll available sources are used: A • Archives, digitized text collections, corpora, Google (at a pinch) • Native speaker intuition (linguists and others, including users on- line) u • The search is sometimes inconclusive: Yggdrasill (masc.) ‘the great tree’ from Norse mythology: o Dative Yggdrasil or Yggdrasli? o The regular dative would be Yggdrasli. o  No examples of the dative can be found in the Old Icelandic sources; modern examples are dubious. o The dative of the neuter noun drasl ‘rubbish’ is drasli. o  When referring to a shop in today’s Reykjavík named Yggdrasill speakers seem to avoid the dative. (It makes people laugh.) o Examples of Yggdrasli appear on the web, mostly in disputes on the dative ...
  • 14. Research for the DMII Ambiguous inflectional forms make searching time-consuming Inflectional forms in DMII 5,881,374 Unambiguous 1,850,090 31.5% Ambiguous within 1 lemma 3,619,482 61.5% Ambiguous between lemmas u Ambiguous within and between lemmas 63,641 1.1% 348,161 5.9% Inflectional form: Word form with inflectional tag. Word forms (unique charater strings): 2.8 million Unambiguous (unique inflectional forms): 1.8 Ambiguous: 1.0 Searching in unannotated text can be extremely slow.
  • 15. Using the MÍM Corpus (The Tagged Icelandic Corpus) cf. Sigrún Helgadóttir et. al., Poster session here at SaLTMIL • 25 million running words from 21st Century texts. • Compatible tags with the DMII. u • For the reason of ambiguity mentioned before, the first opportunity of systematic research on the inflectional system for the use in the DMII. First stage of comparing the MÍM and the DMII: Tokens in MÍM 16,245,429 Unique tagged forms in MÍM 737,856 In DMII 425,238 Not in DMII 312,618
  • 16. Analysis of the word forms from the MÍM Corpus not found in the DMII 312,618 tagged word forms (or strings) (out of 737,856) True Icelandic inflectional forms 60% Miscellaneous strings u 40% Foreign words 25% Errors 6% Abbreviations & acronyms 1.7% Computer strings (Urls, etc.) 0.7% All word forms showing features of adjustment to the Icelandic inflectional system are counted as Icelandic inflectional forms. Approx. additions to the DMII: 125,000 paradigms + uncounted “new” variants.
  • 17. Data scarcity in a rich morphology The MÍM Corpus: ** 623,000 inflectional forms. The present DMII: 270,000 paradigms 5,800,000 inflectional forms DMII with addions form MÍM: u 395,000* paradigms 8,300,000* inflectional forms **Figure for lemmas in MÍM not available yet. * Estimated figures • This comes as no surprise to Icelandic lexicographers; we have always had to cope with the problem of scarcity. • A description of Icelandic inflection based solely on the MÍM corpus would be very meager.
  • 18. Conclusion • The DMII was initially made for two purposes: As an LT resource (including lexicography) For reference for the general public u • In spite of a long history of research, the available data was insufficient. The DMII has therefore become increasingly important in research on Icelandic morphology. • Both corpora and lexicographic data is needed for the DMII. • As the scope of the DMII expands, the production of a rule system becompes more feasible, if one is needed. With a PS on availability ... 
  • 19. Availability The DMII is available for LT projects, free of charge. Download: http://bin.arnastofnun.is/gogn/ Online version: http://bin.arnastofnun.is u The website is as yet only in Icelandic. Send me an email for an English version of the conditions on the use of data from the DMII or any questions: kristinb@hi.is The website is being restructured. The new website will contain extensive comments, explanations of grammatical features, etc. Work on an English version of this part of the site is in progress. The metalanguage of the paradigms themselves will be Icelandic. Be in touch if you need information.
  • 20.
  • 21. Thank you for your attention Kristín B kristinb@hi.is SaLTMIL-AfLat Istanbul May 22th 2012