SlideShare una empresa de Scribd logo
1 de 54
Descargar para leer sin conexión
Open-Domain Word-Level Interpretation of
              Norwegian
              Towards a General Encyclopedic Question-Answering System for
              Norwegian


              Martin Thorsen Ranang
              Department of Computer and Information Science (IDI)

              PhD Thesis Presentation, February 5th 2010
www.ntnu.no                        Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
2

    Outline
       Introduction
           Background and Motivation
           A Central Missing Piece
       Mapping Norwegian Words and Phrases to WordNet
         My Approach
         Compound Words
         Algorithms
         Some of the Results
       TUClopedia—Proof of Concept
         System Overview
         Examples



www.ntnu.no                    Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
3

    Outline
       Introduction
           Background and Motivation
           A Central Missing Piece
       Mapping Norwegian Words and Phrases to WordNet
         My Approach
         Compound Words
         Algorithms
         Some of the Results
       TUClopedia—Proof of Concept
         System Overview
         Examples



www.ntnu.no                    Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
4
    Lack of Advanced Norwegian Language
    Resources
         — Languages shared and spoken by a nation’s citizens are
           woven into their culture, psychology, education and politics
           (Lambert, 1972; Joseph, 2004).
         — Stimulating language usage in the age of globalization.
         — Today’s most advanced language technology:
                     • Mostly only available for the English language.
                     • Example: the question-answering (QA) system
                       Powerset (Converse et al., 2008).1
         — Languages with many speakers, or of great military and/or
           economical significance are prioritized.
              1
                  Powerset, http://www.powerset.com/. Accessed May 22, 2009.



www.ntnu.no                               Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
5
    Lack of Advanced Norwegian Language
    Resources
         — Norwegian Ministry of Culture and Church Affairs (2008,
           pp. 134–137):
                  • important to establish a Norwegian Human Language
                    Technology (HLT) resource collection.2
                  • availability of advanced Norwegian language-technological
                    solutions will strengthen both language and culture.
         — Major consequences within several areas, including: research,
           education, business, and culture.

              2
          The Norwegian Human Language Technology Resource Collection,
       http://www.spraakbanken.uib.no/. Accessed May 22, 2009.



www.ntnu.no                          Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
6

    Goal of the TUClopedia Project
         — To create an open-domain question answering (open-domain
           QA) system that automatically acquires knowledge from, and
           lets users ask questions about, Norwegian encyclopedia
           articles.
              • Store norske leksikon (Henriksen, 2003)
              • The Understanding Computer (TUC) (Amble, 2003)
         — The response from a QA system should ideally be a concise,
           relevant, and correct answer.
         — Not the same as Information Retrieval (IR) systems (like
           Google Web Search) that “only”
              • perform searches in highly indexed material;
              • return lists of relevant documents; and
              • highlight relevant passages within text snippets.



www.ntnu.no                       Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
7

    The Understanding Computer
         — TUC is a Natural Language Understanding (NLU) system
           developed by Tore Amble.
         — Earlier systems based on TUC have been applied to several
           restricted domains. These include
              • BusTUC, an expert system route adviser for the public bus
                transport in Trondheim (Amble, 2000);
              • GeneTUC, understood and extracted information about gene
                and protein interactions from medical articles (Sætre, 2006);
                and
              • ExamTUC, could extract information from an article about the
                kings of Norway and automatically grade written answers to
                examination questions based on the same article (Bruland,
                2002).



www.ntnu.no                       Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
8

    Outline
       Introduction
           Background and Motivation
           A Central Missing Piece
       Mapping Norwegian Words and Phrases to WordNet
         My Approach
         Compound Words
         Algorithms
         Some of the Results
       TUClopedia—Proof of Concept
         System Overview
         Examples



www.ntnu.no                    Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
9

    Lexical-Semantic Network
         — Text Interpretation, or NLU, systems depend on the availability
           of several resources:
              • morphological and lexical knowledge (lexicons);
              • syntax knowledge (grammars); and
              • semantic knowledge (ontologies, or semantic networks).
         — TUC uses lexical semantic knowledge during parsing to verify
           that candidate interpretations of the text make sense.
         — The earlier TUC-based systems that were applied to narrow,
           and/or, restricted domains had used manually crafted highly
           specialized semantic networks.
         — To apply TUC to open-domain QA, a much larger
           lexical-semantic resource was needed . . . for Norwegian.



www.ntnu.no                      Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
10

     Norwegian Lexical-Semantic Resources

         — When I began my research, no large-scale lexical-semantic
           resources existed for Norwegian.
         — Two other approaches should be mentioned:
              • “Frå ordbok til ordnett” by Nygaard (2006)
                  — Does not distinguish between different senses of words.
                  — Provides few semantic relations.
              • Semantic mirrors by Dyvik (2004).
                  — Can distinguish between word senses.
                  — Provides a few more semantic relations than Nygaard’s
                    approach.




www.ntnu.no                        Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
11

     Norwegian Lexical-Semantic Resources
                                           n2
                                  n3
                                                n1
                             n4
       T (n) = n · (n − 1)
                   2
                                                     n0     Both of Nygaard’s and Dyvik’s
              = O(n )
                             n5                             approaches generate isolated
                                                n8          resources, with no connection to
                                  n6
                                           n7
                                                            other Natural Language
                                                            Processing (NLP) resources,
                                           n2
                                  n3                        something that makes them
                                                n1
                             n4
                                                            difficult to integrate into other
                                                            projects.
       T (n) = 2n = O(n)               c             n0

                             n5
                                                n8
                                  n6
                                           n7



www.ntnu.no                                     Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
12

     Outline
       Introduction
           Background and Motivation
           A Central Missing Piece
       Mapping Norwegian Words and Phrases to WordNet
         My Approach
         Compound Words
         Algorithms
         Some of the Results
       TUClopedia—Proof of Concept
         System Overview
         Examples



www.ntnu.no                    Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
13

     My Approach


         — I have developed and implemented a method for automatically
           mapping Norwegian simplex words and compounds to
           semantic concepts in the Princeton WordNet (Miller, 1995;
           Fellbaum, 1998) that are warranted by the semantics of the
           Norwegian source words.
         — The implementation of the method produces a Norwegian
           lexical semantic resource, named Ordnett.




www.ntnu.no                    Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
14
     WordNet Example
     (Hypernyms/Hyponyms)




              Concepts in WordNet are defined by synonym sets (synsets).


www.ntnu.no                        Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
15

     Resources

         1. A (machine readable) dictionary containing translations from
            the source language to the target language;
         2. A dictionary containing translations from the target language
            to the source language; and
         3. The target ontology of the mapping, a lexical semantic
            resource for the target language.
         4. A lexicon containing morphological information about words in
            the source language.
       (Haslerud and Henriksen, 2003; Fellbaum, 1998; Nordgård, 1998)




www.ntnu.no                      Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
16

     Outline
       Introduction
           Background and Motivation
           A Central Missing Piece
       Mapping Norwegian Words and Phrases to WordNet
         My Approach
         Compound Words
         Algorithms
         Some of the Results
       TUClopedia—Proof of Concept
         System Overview
         Examples



www.ntnu.no                    Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
17

     Automatic Analysis of Compound Words

         — «slottsvinduer» (“castle windows”):
              •   «slott» (“castle”)
              •   «svin» (“pig”)
              •   «vin» (“wine”)
              •   «vind» (“wind”)
              •   «du» (“you”)
              •   «due» (“dow”)
              •   «duer» (“dows”)
              •   «er» (“is”)
         — Johannessen and Hauglin (1996); Johannessen (2001).




www.ntnu.no                            Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
18

     Automatic Analysis of Compound Words




www.ntnu.no        Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
19

     Results



       Tested on 5 randomly chosen Norwegian newspaper articles (4951
       words, 109 compounds) with a 100 % success rate.




www.ntnu.no                   Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
20

     Outline
       Introduction
           Background and Motivation
           A Central Missing Piece
       Mapping Norwegian Words and Phrases to WordNet
         My Approach
         Compound Words
         Algorithms
         Some of the Results
       TUClopedia—Proof of Concept
         System Overview
         Examples



www.ntnu.no                    Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
21
     Informal Description of the Main
     Principle
         — Translating the Norwegian word «skatt», we get, e.g., “tax”,
           “treasure”, and “honey”. Each with multiple senses.
         — “Honey” has at least two senses:
              1. “a sweet yellow liquid produced by bees”
              2. “a beloved person”
         — Only the second sense above is warranted by the Norwegian
           word «skatt».
         — The main principle is to exploit the semantic information
           already in WordNet to remove unwarranted mappings.
         — For example, by inverse translating the complement synonyms
           of “honey”, like “loved one” or “beloved”, we can see look for
           overlaps between the inverse translations and the original
           word.

www.ntnu.no                       Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
22

       Basic Algorithm I

1: procedure M AP -WORD - TO -S ENSE(w, C, D , D )
2:    mappings ← ∅
3:    for all t ∈ T RANSLATE(w, C, D ) do
                                                           T RANSLATE(«brystkasse», n, NorEng) =
4:        T ← S ENSES - OF(t, C)
5:        if |T | = 1 then                                 {“chest”, “rib cage”, “thorax”}
6:             mappings ← mappings ∪ {(w, T [0])}
7:             continue                                      S ENSES - OF(“chest”, n) =
8:        for all ti ∈ T do
                                                              chest_n_1, chest_n_2, chest_n_3
9:             mappings ← mappings ∪M IRROR(w, ti , C, D )
10:    return mappings




 www.ntnu.no                            Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
23

       Basic Algorithm II

1: procedure M IRROR(w, ti , C, D )
2:    mappings ← ∅                                               G ET-S YNSET(chest_n_1) =
3:    S ← G ET-S YNSET(ti )                                      {thorax_n_2, chest_n_1, pectus_n_1}
4:    S ← S − {ti }
5:    for all s ∈ S do
                                                                S ← {thorax_n_2, pectus_n_1}
6:        w ← W ORD - FROM -S ENSE(s)
7:        for all w ∈ T RANSLATE(w , C, D ) do
8:            if w = w then                                        T RANSLATE(“thorax”, n, EngNor ) =
9:                mappings ← mappings ∪ {(w, ti )}                 {«bryst», «brystkasse»,
10:    return mappings                                               «toraks», «kropp», «forkropp»}.




 www.ntnu.no                             Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
24

     Extended Algorithm
         1:   procedure M IRROR(w, ti , C, D )
         2:      mappings ← ∅
         3:      S ← G ET-S YNSET(ti )
         4:      for all S TRATEGY ∈ S YNONYM, H YPERNYM do
         5:          S ← S TRATEGY(S, ti )        The “complement” synset.
         6:          for all s ∈ S do                   Complement senses.
         7:              w ← W ORD - FROM -S ENSE(s)
         8:              for all w ∈ T RANSLATE(w , C, D ) do
         9:                  if w = w then
       10:                       mappings ← mappings ∪ {(w, ti )}
       11:           if mappings = ∅ then
       12:               break
       13:       return mappings


www.ntnu.no                       Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
25

     Typical Size of a Smaller Case




www.ntnu.no          Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
26

     Outline
       Introduction
           Background and Motivation
           A Central Missing Piece
       Mapping Norwegian Words and Phrases to WordNet
         My Approach
         Compound Words
         Algorithms
         Some of the Results
       TUClopedia—Proof of Concept
         System Overview
         Examples



www.ntnu.no                    Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
27

     Ordnett—Coverage

       Table: Coverage results of running Verto with the extended dictionaries
       and the S YNONYM and all of the search strategies.

                                         Mapped                 No inverse                  Mappings
       Lexical
       category        Keywords          N         (%)a          N           (%)b           N           Mean
       Noun                45,993     25,879       56.3      15,783         78.5        99,594          2.165
       Adjective            9,526      5,357       56.2       3,167         76.0        35,049          3.679
       Verb                 5,435      4,264       78.5       1,010         86.3        51,010          9.385
       Adverb               1,394        492       35.3         809         89.7         1,610          1.155
         Total             62,348     35,992       57.7      20,769         78.8       187,263          4.096
         a    Percentage of keywords mapped to WordNet senses.
         b    Percentage of cases where no mapping could be found.


www.ntnu.no                               Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
28

     Ordnett—Precision, Recall, F -measure

        Table: Precision, recall, and F0.5 results of running Verto with the original
        dictionaries and the S YNONYM and S IMILAR search strategies.

                                   Mappings
        Lexical category     Correct      Incorrect          Precision           Recall           F0.5
        Noun                      129                 7             0.949         0.222         0.573
        Adjective                 219                10             0.956         0.336         0.698
        Verb                      160                26             0.860         0.161         0.460
        Adverb                    110                10             0.917         0.308         0.657
          Total                   618                53             0.921         0.239         0.587



www.ntnu.no                          Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
29

     Ordnett: Hypernymy




www.ntnu.no        Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
30

     Ordnett: Hypernymy And Meronymy




www.ntnu.no       Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
31

     Ordnett: Meronymy And Entailment




www.ntnu.no        Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
32

     Outline
       Introduction
           Background and Motivation
           A Central Missing Piece
       Mapping Norwegian Words and Phrases to WordNet
         My Approach
         Compound Words
         Algorithms
         Some of the Results
       TUClopedia—Proof of Concept
         System Overview
         Examples



www.ntnu.no                    Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
33

     System Overview
              User interface                                                                   Encyclopedia
                                    Question                            Article

                                                                                                                    Articles

                                Translation            Lexical                                 Named-entity        Named-entity
                                framework             analyzer                                  recognizer           extractor




                                                  Tagged                                      Sentence-
                                                   words             Ordnett
                   Bilingual        Lexicon                                                   boundary
                                                                   (Norwegian–                                Named
                   dictionary     (Morphology)                                                 detector
                                                                    WordNet)                                  entities

                                                       Parser                  Grammar
                 Answer                                                      (Syntax rules)
                                                   Parse
                                                    trees

                                                      Semantic                    Semantic
                                                                                                   WordNet     Onomasticon
                                                     interpreter                  network

                                                   Query                                      Semantic resources
                                               expression
                                                     Reasoning
                                                      engine
                                                            Acquired
                                                            knowledge

                                                     Knowledge
                                                       base


www.ntnu.no                                       Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
34

     Outline
       Introduction
           Background and Motivation
           A Central Missing Piece
       Mapping Norwegian Words and Phrases to WordNet
         My Approach
         Compound Words
         Algorithms
         Some of the Results
       TUClopedia—Proof of Concept
         System Overview
         Examples



www.ntnu.no                    Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
35

     A+



              A+
              programmeringsspråk for datamaskiner, avledet av APL.




www.ntnu.no                      Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
36

     A+—Word Graph
       txt(w(’A+’, name(’A+’, n, programming_language_n_1)), 0, 1).
       txt(w(’er’, verb(be_v_2, pres, fin)), 1, 2).
       txt(w(’programmeringsspråk’,
             noun(programming_language_n_1, plu, u, n)), 2, 3).
       txt(w(’programmeringsspråk’,
             noun(programming_language_n_1, sin, u, n)), 2, 3).
       txt(w(’for’, prep(’for’)), 3, 4).
       txt(w(’datamaskiner’, noun(computer_n_1, plu, u, n)), 4, 5).
       txt(w(’avledet’, verb(derive_v_3, past, part)), 5, 6).
       txt(w(’avledet’, [’avledet’]), 5, 6).
       txt(w(’av’, prep(’from’)), 6, 7).
       txt(w(’av’, prep(’of’)), 6, 7).
       txt(w(’av’, prep(’off’)), 6, 7).
       txt(w(’APL’, name(apl, n, programming_language_n_1)), 7, 8).
       txt(w(’.’, [’.’]), 8, 9).


www.ntnu.no                      Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
37

     A+—Acquired Knowledge (TQL) I
  % ‘‘A+ programmeringsspråk for datamaskiner, avledet av APL.’’
  % (A+ programming language for computers, derived from APL.)

  isa(programming_language_n_1, real, sk(1))
  isa(computer_n_1, real, sk(2))

  % Programming language, sk(1), is for computer, sk(2).
  event(sk(3))                                                              % A+ is programming language, sk(1).
  agent(sk(1), sk(3))                                                       event(sk(5))
  present(real, sk(3))                                                      agent(’A+’, sk(5))
  action(be_v_4, sk(3))                                                     present(real, sk(5))
  mod(for, sk(2), sk(3))                                                    action(be_v_2, sk(5))
                                                                            patient(sk(1), sk(5))
  % Somebody (pro) derives programming language, sk(1), from ’APL’
  event(sk(4))
  present(real, sk(4))
  action(derive_v_3, sk(4))
  patient(sk(1), sk(4))
  agent(pro, sk(4))
  mod(from, ’APL’, sk(4))




www.ntnu.no                                          Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
38

     Drapet på Kennedy
              Presidenten ble myrdet av en snikskytter 22. november
              1963 under et besøk i Dallas, Texas. Kennedy-mordet ble
              umiddelbart gjenstand for en rekke ulike teorier og
              påstander som har fortsatt å oppta allmennheten etterpå.
              [. . . ]
              [The Kennedy Assassination.
              The President was assassinated November 22, 1963
              during a visit to Dallas, Texas. The Kennedy
              assassination immediately became subject to a range of
              different theories and allegations that subsequently have
              continued to occupy the general public. [. . . ]]




www.ntnu.no                        Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
39

     Acquired Knowledge
              isa(visit_n_5, real, sk(3))
              isa(dallas_n_1, real, sk(4))
                                                              isa(president_n_3, real, sk(1))
              % There is a visit, sk(3), in Dallas, sk(4).
                                                              isa(sniper_n_1, real, sk(2))
              event(sk(5))
              agent(sk(3), sk(5))
                                                              % The President, sk(1), was murdered by a
              present(real, sk(5))
                                                              % sniper, sk(2), during the visit, sk(3),
              action(be_v_4, sk(5))
                                                              % on November 22, 1963.
              mod(in, sk(4), sk(5))
                                                              event(sk(8))
                                                              past(real, sk(8))
              isa(texas_n_1, real, sk(6))
                                                              action(murder_v_1, sk(8))
                                                              patient(sk(1), sk(8))
              % The visit, sk(3), is in Texas, sk(6).
                                                              agent(sk(2), sk(8))
              event(sk(7))
                                                              mod(under, sk(3), sk(8))
              agent(sk(3), sk(7))
                                                              mod(nil, date(1963, 11, 22), sk(8))
              present(real, sk(7))
              action(be_v_4, sk(7))
              mod(in, sk(6), sk(7))



www.ntnu.no                                     Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
40

     Question Answering
       «Hvem myrdet presidenten?» (“Who murdered the president?”)

                                              which(sk(2))
                                              isa(somebody_n_1, real, sk(2)) <=
                                                  isa(person_n_1, real, sk(2)) <=
                                                      isa(expert_n_1, real, sk(2)) <=
              which(A),
                                                          isa(sniper_n_1, real, sk(2)) % A = sk(2).
              isa(somebody_n_1, real, A),
                                              isa(president_n_3, real, sk(1)) % B = sk(1).
              isa(president_n_3, real, B),
                                              agent(sk(2), sk(8)) % C = sk(8).
              agent(A, C),
                                              action(murder_v_1, sk(8))
              action(murder_v_1, C),
                                              past(real, sk(8))
              past(real, C),
                                              event(sk(8))
              event(C),
                                              patient(sk(1), sk(8))
              patient(B, C)
                                              =>

                                              sk(2)




www.ntnu.no                                  Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
41

     References I
       Amble, Tore. 2000. BusTUC - a natural language bus route oracle.
        In Proceedings of the 6th Applied Natural Language Processing
        Conference, 1–6. Seattle, Washington, USA: Association for
        Computational Linguistics.
       ———. 2003. The understanding computer: Natural language
        understanding in practice. Trondheim, Norway: Department of
        Computer and Information Science, Norwegian University of
        Science and Technology. Preliminary version.
       Bruland, Tore. 2002. ExamTUC - a simple examination system in
         natural language. Sivilingeniør’s thesis, Department of Computer
         and Information Science, Norwegian University of Science and
         Technology, Trondheim, Norway.


www.ntnu.no                     Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
42

     References II
       Converse, Tim, Ronald M. Kaplan, Barney Pell, Scott Prevost,
        Lorenzo Thione, and Chad Walters. 2008. Powerset’s natural
        language Wikipedia search engine. In Wikipedia and Artificial
        Intelligence: An Evolving Synergy: Papers from the AAAI
        Workshop, 67. Techical Report WS-08-15, Menlo Park,
        California, USA: The AAAI Press.
       Dyvik, Helge. 2004. Translations as semantic mirrors. From parallel
         corpus to WordNet. Language and Computers 49(1):311–326.
       Fellbaum, Christiane, ed. 1998. WordNet: An electronic lexical
         database. Language, Speech, and Communication, Cambridge,
         Massachusetts, USA: The MIT Press.




www.ntnu.no                     Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
43

     References III
       Haslerud, Vibecke C. D., and Petter Henriksen, eds. 2003. Engelsk
        stor ordbok: Engelsk-norsk / norsk-engelsk. Oslo, Norway:
        Kunnskapsforlaget.
       Henriksen, Petter, ed. 2003. Aschehougs og Gyldendals Store
        Norske Leksikon. Oslo, Norway: Kunnskapsforlaget.
       Johannessen, Janne Bondi. 2001. Sammensatte ord. Norsk
         Lingvistisk Tidsskrift 19:59–91.
       Johannessen, Janne Bondi, and Helge Hauglin. 1996. An
         automatic analysis of Norwegian compounds. In Papers from the
         16th Scandinavian Conference of Linguistics. Turku/Åbo,
         Finland.



www.ntnu.no                    Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
44

     References IV
       Joseph, John Earl. 2004. Language and Identity: National, Ethnic,
         Religious. Hampshire, UK: Palgrave Macmillan.
       Lambert, Wallace E. 1972. Language, psychology, and culture.
         Language Science and National Development Series, Stanford,
         California, USA: Stanford University Press.
       Miller, George A. 1995. WordNet: a lexical database for English.
         Communications of the ACM 38(11):39–41.
       Nordgård, Torbjørn. 1998. Norwegian computational lexicon
        (NorKompLeks). In Proceedings of the 11th Nordic Conference
        on Computational Linguistics, ed. Bente Maegaard, 34–44.
        University of Copenhagen, Denmark: Center for Sprogteknologi.




www.ntnu.no                     Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
45

     References V

       Norwegian Ministry of Culture and Church Affairs. 2008. Mål og
        meining: Ein heilskapleg norsk språkpolitikk. Stortingsmelding
        nr. 35 (2007–2008), Norwegian Ministry of Culture and Church
        Affairs, Oslo, Norway.
       Nygaard, Lars. 2006. Frå ordbok til ordnett. Cand.philol.-oppgåve,
         Universitetet i Oslo, Norway.
       Sætre, Rune. 2006. GeneTUC: Natural Language Understanding
        in Medical Text. Ph.D. thesis, Norwegian University of Science
        and Technology, Trondheim, Norway.




www.ntnu.no                     Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
Appendix




www.ntnu.no   Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
47

     Other Non-English Languages


         — Compounding in, for example, German:
              • “usability”:
                  der Gebrauch (usage) + die Tauglichkeit (capability)
                  → die Gebrauch-s-tauglichkeit
              • “horse paddock”:
                  das Pferd + die Koppel → die Pferd-e-Koppel




www.ntnu.no                       Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
48

     Research Questions and Answers

       Q1 Is it possible to develop a method for automatically building a
          broad-domain, general semantic resource for Norwegian that
          is compatible with existing freely available, widely used,
          English semantic resources?
        A1 Yes. Verto, the implementation of the novel method and
           algorithms presented in my thesis, was successfully used to
           automatically create Ordnett, a broad-domain, general
           lexical-semantic resource that maps Norwegian words and
           phrases to concepts in the Princeton WordNet.




www.ntnu.no                     Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
49

     Research Questions and Answers

       Q2 What restrictions apply to the method? What resources are
          prerequisite?
        A2 The method described herein requires (1) a simple bilingual
           dictionary that contains information about the lexical
           category of entries and maps words in the source language to
           words in the target language; and (2) a lexical database that
           contains semantic information about synonymy, hypernymy,
           and similarity between senses in the target language. To
           handle compound words, the method also requires the
           availability of an automatic compound analyzer.




www.ntnu.no                     Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
50

     Research Questions and Answers
       Q3 For each of the lexical categories nouns, verbs, adjectives,
          and adverbs, how large fraction of the words in each class
          does the method work for?
       A3 With the configuration of the mapping framework that provided
          the maximum average precision value of 0.921 (achieved in
          Test Run 4), the number of successfully mapped words per
          lexical category were noun 22,888 (50.3 %); adjective 4,387
          (47.7 %); verb 3,495 (65.2 %); and adverb 186 (31.9 %).
          With the configuration that provided the maximum recall value
          of 0.365 (achieved in Test Run 12), the number of successfully
          mapped words per lexical category were noun 25,879
          (56.3 %); adjective 5,357 (56.2 %); verb 4,264 (78.5 %); and
          adverb 492 (35.3 %). The same configuration also provided
          the maximum F0.5 -score of 0.680.

www.ntnu.no                    Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
51

     Research Questions and Answers

       Q4 How well does the method avoid mapping words in the source
          language to senses in the target language that carry meanings
          that are not covered by the source words?
        A4 Comparison of the results produced by the system presented
           herein with those produced by a human expert showed that
           with a configuration that maximizes the precision value, the
           system reaches a precision score of 0.921.
           This shows that the system is able to rather precisely avoid
           unwarranted mappings.




www.ntnu.no                     Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
52

     Research Questions and Answers


       Q5 Can the method handle (single word) compounds that occur
          frequently in Norwegian, but more seldom in English?
        A5 Yes. The framework that implements the method presented
           herein handles mapping of compounds by applying an
           integrated compound-word analyzer.




www.ntnu.no                   Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
53

     Research Questions and Answers
       Q6 How may the resulting semantic resource be useful for
          different areas of NLP research and development?
       A6 One of the most important benefits of the method is that it
          produces a semantic resource that is linked to the Princeton
          WordNet, and thereby can benefit from other semantic
          resources that extends or complements—but are still
          compatible with—WordNet.
          [. . . ] It should be possible to utilize Ordnett for Norwegian
          (versions of) applications within most areas of NLP where
          ontologies are already used. These areas include, but is not
          necessarily limited to, Word-Sense Disambiguation (WSD),
          Text Summarization, Document Clustering, Information
          Extraction (IE), IR, Text Interpretation, and Natural Language
          Generation (NLG).

www.ntnu.no                     Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
54

     Research Questions and Answers

       Q7 Can the method be applied to other languages, and if so, to
          which ones?
        A7 I see no reason why the method should not be applicable to at
           least other Germanic languages, like the Scandinavian
           languages, German, and Dutch.
           However, if the language makes extensive use of
           compounding, an automatic compound-word analyzer may be
           required.




www.ntnu.no                     Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian

Más contenido relacionado

La actualidad más candente

PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabili...
PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabili...PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabili...
PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabili...Rommel Carvalho
 
Mining Opinion Features in Customer Reviews
Mining Opinion Features in Customer ReviewsMining Opinion Features in Customer Reviews
Mining Opinion Features in Customer ReviewsIJCERT JOURNAL
 
The Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language ProcessingThe Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language ProcessingWaqas Tariq
 
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...ijnlc
 
Generating Lexical Information for Terminology in a Bioinformatics Ontology
Generating Lexical Information for Terminologyin a Bioinformatics OntologyGenerating Lexical Information for Terminologyin a Bioinformatics Ontology
Generating Lexical Information for Terminology in a Bioinformatics OntologyHammad Afzal
 
NL Context Understanding 23(6)
NL Context Understanding 23(6)NL Context Understanding 23(6)
NL Context Understanding 23(6)IT Industry
 
Effective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic AmbiguityEffective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic AmbiguityIDES Editor
 
Cognitive plausibility in learning algorithms
Cognitive plausibility in learning algorithmsCognitive plausibility in learning algorithms
Cognitive plausibility in learning algorithmsAndré Karpištšenko
 
MEBI 591C/598 – Data and Text Mining in Biomedical Informatics
MEBI 591C/598 – Data and Text Mining in Biomedical InformaticsMEBI 591C/598 – Data and Text Mining in Biomedical Informatics
MEBI 591C/598 – Data and Text Mining in Biomedical Informaticsbutest
 
Lean Logic for Lean Times: Entailment and Contradiction Revisited
Lean Logic for Lean Times: Entailment and Contradiction RevisitedLean Logic for Lean Times: Entailment and Contradiction Revisited
Lean Logic for Lean Times: Entailment and Contradiction RevisitedValeria de Paiva
 
Digital Humanities Research, Europeana and The Embedded Semantic Research L...
Digital Humanities Research, Europeana  and The Embedded Semantic Research  L...Digital Humanities Research, Europeana  and The Embedded Semantic Research  L...
Digital Humanities Research, Europeana and The Embedded Semantic Research L...Net7
 
Lecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyLecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyMarina Santini
 
Portuguese Linguistic Tools: What, Why and How
Portuguese Linguistic Tools: What, Why and HowPortuguese Linguistic Tools: What, Why and How
Portuguese Linguistic Tools: What, Why and HowValeria de Paiva
 
Text Mining: Beyond Extraction Towards Exploitation
Text Mining: Beyond Extraction Towards ExploitationText Mining: Beyond Extraction Towards Exploitation
Text Mining: Beyond Extraction Towards Exploitationbutest
 

La actualidad más candente (19)

PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabili...
PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabili...PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabili...
PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabili...
 
Mining Opinion Features in Customer Reviews
Mining Opinion Features in Customer ReviewsMining Opinion Features in Customer Reviews
Mining Opinion Features in Customer Reviews
 
The Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language ProcessingThe Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language Processing
 
10.1.1.35.8376
10.1.1.35.837610.1.1.35.8376
10.1.1.35.8376
 
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...
 
Generating Lexical Information for Terminology in a Bioinformatics Ontology
Generating Lexical Information for Terminologyin a Bioinformatics OntologyGenerating Lexical Information for Terminologyin a Bioinformatics Ontology
Generating Lexical Information for Terminology in a Bioinformatics Ontology
 
ppt
pptppt
ppt
 
NL Context Understanding 23(6)
NL Context Understanding 23(6)NL Context Understanding 23(6)
NL Context Understanding 23(6)
 
1 pos chunker -6-10
1 pos chunker -6-101 pos chunker -6-10
1 pos chunker -6-10
 
Effective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic AmbiguityEffective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic Ambiguity
 
PORTUGUESE NLP FOR FGV?
PORTUGUESE NLP FOR FGV?PORTUGUESE NLP FOR FGV?
PORTUGUESE NLP FOR FGV?
 
Cognitive plausibility in learning algorithms
Cognitive plausibility in learning algorithmsCognitive plausibility in learning algorithms
Cognitive plausibility in learning algorithms
 
MEBI 591C/598 – Data and Text Mining in Biomedical Informatics
MEBI 591C/598 – Data and Text Mining in Biomedical InformaticsMEBI 591C/598 – Data and Text Mining in Biomedical Informatics
MEBI 591C/598 – Data and Text Mining in Biomedical Informatics
 
Lean Logic for Lean Times: Entailment and Contradiction Revisited
Lean Logic for Lean Times: Entailment and Contradiction RevisitedLean Logic for Lean Times: Entailment and Contradiction Revisited
Lean Logic for Lean Times: Entailment and Contradiction Revisited
 
L1803058388
L1803058388L1803058388
L1803058388
 
Digital Humanities Research, Europeana and The Embedded Semantic Research L...
Digital Humanities Research, Europeana  and The Embedded Semantic Research  L...Digital Humanities Research, Europeana  and The Embedded Semantic Research  L...
Digital Humanities Research, Europeana and The Embedded Semantic Research L...
 
Lecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyLecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language Technology
 
Portuguese Linguistic Tools: What, Why and How
Portuguese Linguistic Tools: What, Why and HowPortuguese Linguistic Tools: What, Why and How
Portuguese Linguistic Tools: What, Why and How
 
Text Mining: Beyond Extraction Towards Exploitation
Text Mining: Beyond Extraction Towards ExploitationText Mining: Beyond Extraction Towards Exploitation
Text Mining: Beyond Extraction Towards Exploitation
 

Similar a Ph.D. Defense Presentation of "Open-Domain Word-Level Interpretation of Norwegian: Towards a General Encyclopedic Question-Answering System for Norwegian"

Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...alessio_ferrari
 
A Comprehensive Study On Natural Language Processing And Natural Language Int...
A Comprehensive Study On Natural Language Processing And Natural Language Int...A Comprehensive Study On Natural Language Processing And Natural Language Int...
A Comprehensive Study On Natural Language Processing And Natural Language Int...Scott Bou
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)VenkateshMurugadas
 
Roeder rocky 2011_46
Roeder rocky 2011_46Roeder rocky 2011_46
Roeder rocky 2011_46Chris Roeder
 
Statistical Named Entity Recognition for Hungarian – analysis ...
Statistical Named Entity Recognition for Hungarian – analysis ...Statistical Named Entity Recognition for Hungarian – analysis ...
Statistical Named Entity Recognition for Hungarian – analysis ...butest
 
NLP applicata a LIS
NLP applicata a LISNLP applicata a LIS
NLP applicata a LISnoemiricci2
 
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...Daniel Vila Suero
 
OpenWN-PT: a Brazilian Wordnet for all
OpenWN-PT: a Brazilian Wordnet for allOpenWN-PT: a Brazilian Wordnet for all
OpenWN-PT: a Brazilian Wordnet for allAlexandre Rademaker
 
Logics and Ontologies for Portuguese Understanding
Logics and Ontologies for Portuguese UnderstandingLogics and Ontologies for Portuguese Understanding
Logics and Ontologies for Portuguese UnderstandingValeria de Paiva
 
Laurel Stvan dh ant_conc 2/27/13
Laurel Stvan dh ant_conc 2/27/13Laurel Stvan dh ant_conc 2/27/13
Laurel Stvan dh ant_conc 2/27/13Jessica C. Murphy
 
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...kevig
 
AINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, CoutoAINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, CoutoLidia Pivovarova
 
Natural language processing (NLP)
Natural language processing (NLP) Natural language processing (NLP)
Natural language processing (NLP) ASWINKP11
 
Using Stanza NLP and TensorFlow to create a summary of a book
Using Stanza NLP and TensorFlow to create a summary of a bookUsing Stanza NLP and TensorFlow to create a summary of a book
Using Stanza NLP and TensorFlow to create a summary of a bookOlusola Amusan
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingVeenaSKumar2
 
Introduction to natural language processing, history and origin
Introduction to natural language processing, history and originIntroduction to natural language processing, history and origin
Introduction to natural language processing, history and originShubhankar Mohan
 

Similar a Ph.D. Defense Presentation of "Open-Domain Word-Level Interpretation of Norwegian: Towards a General Encyclopedic Question-Answering System for Norwegian" (20)

Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...
 
A Comprehensive Study On Natural Language Processing And Natural Language Int...
A Comprehensive Study On Natural Language Processing And Natural Language Int...A Comprehensive Study On Natural Language Processing And Natural Language Int...
A Comprehensive Study On Natural Language Processing And Natural Language Int...
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
 
Roeder rocky 2011_46
Roeder rocky 2011_46Roeder rocky 2011_46
Roeder rocky 2011_46
 
Statistical Named Entity Recognition for Hungarian – analysis ...
Statistical Named Entity Recognition for Hungarian – analysis ...Statistical Named Entity Recognition for Hungarian – analysis ...
Statistical Named Entity Recognition for Hungarian – analysis ...
 
NLP applicata a LIS
NLP applicata a LISNLP applicata a LIS
NLP applicata a LIS
 
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...
 
OpenWN-PT: a Brazilian Wordnet for all
OpenWN-PT: a Brazilian Wordnet for allOpenWN-PT: a Brazilian Wordnet for all
OpenWN-PT: a Brazilian Wordnet for all
 
Logics and Ontologies for Portuguese Understanding
Logics and Ontologies for Portuguese UnderstandingLogics and Ontologies for Portuguese Understanding
Logics and Ontologies for Portuguese Understanding
 
Laurel Stvan dh ant_conc 2/27/13
Laurel Stvan dh ant_conc 2/27/13Laurel Stvan dh ant_conc 2/27/13
Laurel Stvan dh ant_conc 2/27/13
 
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
 
RusLTC at TSD-2014 (Brno)
RusLTC at TSD-2014 (Brno)RusLTC at TSD-2014 (Brno)
RusLTC at TSD-2014 (Brno)
 
AINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, CoutoAINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, Couto
 
Session5 03.george rehm
Session5 03.george rehmSession5 03.george rehm
Session5 03.george rehm
 
FinalReport
FinalReportFinalReport
FinalReport
 
Natural language processing (NLP)
Natural language processing (NLP) Natural language processing (NLP)
Natural language processing (NLP)
 
Using Stanza NLP and TensorFlow to create a summary of a book
Using Stanza NLP and TensorFlow to create a summary of a bookUsing Stanza NLP and TensorFlow to create a summary of a book
Using Stanza NLP and TensorFlow to create a summary of a book
 
subrat
 subrat subrat
subrat
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Introduction to natural language processing, history and origin
Introduction to natural language processing, history and originIntroduction to natural language processing, history and origin
Introduction to natural language processing, history and origin
 

Ph.D. Defense Presentation of "Open-Domain Word-Level Interpretation of Norwegian: Towards a General Encyclopedic Question-Answering System for Norwegian"

  • 1. Open-Domain Word-Level Interpretation of Norwegian Towards a General Encyclopedic Question-Answering System for Norwegian Martin Thorsen Ranang Department of Computer and Information Science (IDI) PhD Thesis Presentation, February 5th 2010 www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 2. 2 Outline Introduction Background and Motivation A Central Missing Piece Mapping Norwegian Words and Phrases to WordNet My Approach Compound Words Algorithms Some of the Results TUClopedia—Proof of Concept System Overview Examples www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 3. 3 Outline Introduction Background and Motivation A Central Missing Piece Mapping Norwegian Words and Phrases to WordNet My Approach Compound Words Algorithms Some of the Results TUClopedia—Proof of Concept System Overview Examples www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 4. 4 Lack of Advanced Norwegian Language Resources — Languages shared and spoken by a nation’s citizens are woven into their culture, psychology, education and politics (Lambert, 1972; Joseph, 2004). — Stimulating language usage in the age of globalization. — Today’s most advanced language technology: • Mostly only available for the English language. • Example: the question-answering (QA) system Powerset (Converse et al., 2008).1 — Languages with many speakers, or of great military and/or economical significance are prioritized. 1 Powerset, http://www.powerset.com/. Accessed May 22, 2009. www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 5. 5 Lack of Advanced Norwegian Language Resources — Norwegian Ministry of Culture and Church Affairs (2008, pp. 134–137): • important to establish a Norwegian Human Language Technology (HLT) resource collection.2 • availability of advanced Norwegian language-technological solutions will strengthen both language and culture. — Major consequences within several areas, including: research, education, business, and culture. 2 The Norwegian Human Language Technology Resource Collection, http://www.spraakbanken.uib.no/. Accessed May 22, 2009. www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 6. 6 Goal of the TUClopedia Project — To create an open-domain question answering (open-domain QA) system that automatically acquires knowledge from, and lets users ask questions about, Norwegian encyclopedia articles. • Store norske leksikon (Henriksen, 2003) • The Understanding Computer (TUC) (Amble, 2003) — The response from a QA system should ideally be a concise, relevant, and correct answer. — Not the same as Information Retrieval (IR) systems (like Google Web Search) that “only” • perform searches in highly indexed material; • return lists of relevant documents; and • highlight relevant passages within text snippets. www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 7. 7 The Understanding Computer — TUC is a Natural Language Understanding (NLU) system developed by Tore Amble. — Earlier systems based on TUC have been applied to several restricted domains. These include • BusTUC, an expert system route adviser for the public bus transport in Trondheim (Amble, 2000); • GeneTUC, understood and extracted information about gene and protein interactions from medical articles (Sætre, 2006); and • ExamTUC, could extract information from an article about the kings of Norway and automatically grade written answers to examination questions based on the same article (Bruland, 2002). www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 8. 8 Outline Introduction Background and Motivation A Central Missing Piece Mapping Norwegian Words and Phrases to WordNet My Approach Compound Words Algorithms Some of the Results TUClopedia—Proof of Concept System Overview Examples www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 9. 9 Lexical-Semantic Network — Text Interpretation, or NLU, systems depend on the availability of several resources: • morphological and lexical knowledge (lexicons); • syntax knowledge (grammars); and • semantic knowledge (ontologies, or semantic networks). — TUC uses lexical semantic knowledge during parsing to verify that candidate interpretations of the text make sense. — The earlier TUC-based systems that were applied to narrow, and/or, restricted domains had used manually crafted highly specialized semantic networks. — To apply TUC to open-domain QA, a much larger lexical-semantic resource was needed . . . for Norwegian. www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 10. 10 Norwegian Lexical-Semantic Resources — When I began my research, no large-scale lexical-semantic resources existed for Norwegian. — Two other approaches should be mentioned: • “Frå ordbok til ordnett” by Nygaard (2006) — Does not distinguish between different senses of words. — Provides few semantic relations. • Semantic mirrors by Dyvik (2004). — Can distinguish between word senses. — Provides a few more semantic relations than Nygaard’s approach. www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 11. 11 Norwegian Lexical-Semantic Resources n2 n3 n1 n4 T (n) = n · (n − 1) 2 n0 Both of Nygaard’s and Dyvik’s = O(n ) n5 approaches generate isolated n8 resources, with no connection to n6 n7 other Natural Language Processing (NLP) resources, n2 n3 something that makes them n1 n4 difficult to integrate into other projects. T (n) = 2n = O(n) c n0 n5 n8 n6 n7 www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 12. 12 Outline Introduction Background and Motivation A Central Missing Piece Mapping Norwegian Words and Phrases to WordNet My Approach Compound Words Algorithms Some of the Results TUClopedia—Proof of Concept System Overview Examples www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 13. 13 My Approach — I have developed and implemented a method for automatically mapping Norwegian simplex words and compounds to semantic concepts in the Princeton WordNet (Miller, 1995; Fellbaum, 1998) that are warranted by the semantics of the Norwegian source words. — The implementation of the method produces a Norwegian lexical semantic resource, named Ordnett. www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 14. 14 WordNet Example (Hypernyms/Hyponyms) Concepts in WordNet are defined by synonym sets (synsets). www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 15. 15 Resources 1. A (machine readable) dictionary containing translations from the source language to the target language; 2. A dictionary containing translations from the target language to the source language; and 3. The target ontology of the mapping, a lexical semantic resource for the target language. 4. A lexicon containing morphological information about words in the source language. (Haslerud and Henriksen, 2003; Fellbaum, 1998; Nordgård, 1998) www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 16. 16 Outline Introduction Background and Motivation A Central Missing Piece Mapping Norwegian Words and Phrases to WordNet My Approach Compound Words Algorithms Some of the Results TUClopedia—Proof of Concept System Overview Examples www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 17. 17 Automatic Analysis of Compound Words — «slottsvinduer» (“castle windows”): • «slott» (“castle”) • «svin» (“pig”) • «vin» (“wine”) • «vind» (“wind”) • «du» (“you”) • «due» (“dow”) • «duer» (“dows”) • «er» (“is”) — Johannessen and Hauglin (1996); Johannessen (2001). www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 18. 18 Automatic Analysis of Compound Words www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 19. 19 Results Tested on 5 randomly chosen Norwegian newspaper articles (4951 words, 109 compounds) with a 100 % success rate. www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 20. 20 Outline Introduction Background and Motivation A Central Missing Piece Mapping Norwegian Words and Phrases to WordNet My Approach Compound Words Algorithms Some of the Results TUClopedia—Proof of Concept System Overview Examples www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 21. 21 Informal Description of the Main Principle — Translating the Norwegian word «skatt», we get, e.g., “tax”, “treasure”, and “honey”. Each with multiple senses. — “Honey” has at least two senses: 1. “a sweet yellow liquid produced by bees” 2. “a beloved person” — Only the second sense above is warranted by the Norwegian word «skatt». — The main principle is to exploit the semantic information already in WordNet to remove unwarranted mappings. — For example, by inverse translating the complement synonyms of “honey”, like “loved one” or “beloved”, we can see look for overlaps between the inverse translations and the original word. www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 22. 22 Basic Algorithm I 1: procedure M AP -WORD - TO -S ENSE(w, C, D , D ) 2: mappings ← ∅ 3: for all t ∈ T RANSLATE(w, C, D ) do T RANSLATE(«brystkasse», n, NorEng) = 4: T ← S ENSES - OF(t, C) 5: if |T | = 1 then {“chest”, “rib cage”, “thorax”} 6: mappings ← mappings ∪ {(w, T [0])} 7: continue S ENSES - OF(“chest”, n) = 8: for all ti ∈ T do chest_n_1, chest_n_2, chest_n_3 9: mappings ← mappings ∪M IRROR(w, ti , C, D ) 10: return mappings www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 23. 23 Basic Algorithm II 1: procedure M IRROR(w, ti , C, D ) 2: mappings ← ∅ G ET-S YNSET(chest_n_1) = 3: S ← G ET-S YNSET(ti ) {thorax_n_2, chest_n_1, pectus_n_1} 4: S ← S − {ti } 5: for all s ∈ S do S ← {thorax_n_2, pectus_n_1} 6: w ← W ORD - FROM -S ENSE(s) 7: for all w ∈ T RANSLATE(w , C, D ) do 8: if w = w then T RANSLATE(“thorax”, n, EngNor ) = 9: mappings ← mappings ∪ {(w, ti )} {«bryst», «brystkasse», 10: return mappings «toraks», «kropp», «forkropp»}. www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 24. 24 Extended Algorithm 1: procedure M IRROR(w, ti , C, D ) 2: mappings ← ∅ 3: S ← G ET-S YNSET(ti ) 4: for all S TRATEGY ∈ S YNONYM, H YPERNYM do 5: S ← S TRATEGY(S, ti ) The “complement” synset. 6: for all s ∈ S do Complement senses. 7: w ← W ORD - FROM -S ENSE(s) 8: for all w ∈ T RANSLATE(w , C, D ) do 9: if w = w then 10: mappings ← mappings ∪ {(w, ti )} 11: if mappings = ∅ then 12: break 13: return mappings www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 25. 25 Typical Size of a Smaller Case www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 26. 26 Outline Introduction Background and Motivation A Central Missing Piece Mapping Norwegian Words and Phrases to WordNet My Approach Compound Words Algorithms Some of the Results TUClopedia—Proof of Concept System Overview Examples www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 27. 27 Ordnett—Coverage Table: Coverage results of running Verto with the extended dictionaries and the S YNONYM and all of the search strategies. Mapped No inverse Mappings Lexical category Keywords N (%)a N (%)b N Mean Noun 45,993 25,879 56.3 15,783 78.5 99,594 2.165 Adjective 9,526 5,357 56.2 3,167 76.0 35,049 3.679 Verb 5,435 4,264 78.5 1,010 86.3 51,010 9.385 Adverb 1,394 492 35.3 809 89.7 1,610 1.155 Total 62,348 35,992 57.7 20,769 78.8 187,263 4.096 a Percentage of keywords mapped to WordNet senses. b Percentage of cases where no mapping could be found. www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 28. 28 Ordnett—Precision, Recall, F -measure Table: Precision, recall, and F0.5 results of running Verto with the original dictionaries and the S YNONYM and S IMILAR search strategies. Mappings Lexical category Correct Incorrect Precision Recall F0.5 Noun 129 7 0.949 0.222 0.573 Adjective 219 10 0.956 0.336 0.698 Verb 160 26 0.860 0.161 0.460 Adverb 110 10 0.917 0.308 0.657 Total 618 53 0.921 0.239 0.587 www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 29. 29 Ordnett: Hypernymy www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 30. 30 Ordnett: Hypernymy And Meronymy www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 31. 31 Ordnett: Meronymy And Entailment www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 32. 32 Outline Introduction Background and Motivation A Central Missing Piece Mapping Norwegian Words and Phrases to WordNet My Approach Compound Words Algorithms Some of the Results TUClopedia—Proof of Concept System Overview Examples www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 33. 33 System Overview User interface Encyclopedia Question Article Articles Translation Lexical Named-entity Named-entity framework analyzer recognizer extractor Tagged Sentence- words Ordnett Bilingual Lexicon boundary (Norwegian– Named dictionary (Morphology) detector WordNet) entities Parser Grammar Answer (Syntax rules) Parse trees Semantic Semantic WordNet Onomasticon interpreter network Query Semantic resources expression Reasoning engine Acquired knowledge Knowledge base www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 34. 34 Outline Introduction Background and Motivation A Central Missing Piece Mapping Norwegian Words and Phrases to WordNet My Approach Compound Words Algorithms Some of the Results TUClopedia—Proof of Concept System Overview Examples www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 35. 35 A+ A+ programmeringsspråk for datamaskiner, avledet av APL. www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 36. 36 A+—Word Graph txt(w(’A+’, name(’A+’, n, programming_language_n_1)), 0, 1). txt(w(’er’, verb(be_v_2, pres, fin)), 1, 2). txt(w(’programmeringsspråk’, noun(programming_language_n_1, plu, u, n)), 2, 3). txt(w(’programmeringsspråk’, noun(programming_language_n_1, sin, u, n)), 2, 3). txt(w(’for’, prep(’for’)), 3, 4). txt(w(’datamaskiner’, noun(computer_n_1, plu, u, n)), 4, 5). txt(w(’avledet’, verb(derive_v_3, past, part)), 5, 6). txt(w(’avledet’, [’avledet’]), 5, 6). txt(w(’av’, prep(’from’)), 6, 7). txt(w(’av’, prep(’of’)), 6, 7). txt(w(’av’, prep(’off’)), 6, 7). txt(w(’APL’, name(apl, n, programming_language_n_1)), 7, 8). txt(w(’.’, [’.’]), 8, 9). www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 37. 37 A+—Acquired Knowledge (TQL) I % ‘‘A+ programmeringsspråk for datamaskiner, avledet av APL.’’ % (A+ programming language for computers, derived from APL.) isa(programming_language_n_1, real, sk(1)) isa(computer_n_1, real, sk(2)) % Programming language, sk(1), is for computer, sk(2). event(sk(3)) % A+ is programming language, sk(1). agent(sk(1), sk(3)) event(sk(5)) present(real, sk(3)) agent(’A+’, sk(5)) action(be_v_4, sk(3)) present(real, sk(5)) mod(for, sk(2), sk(3)) action(be_v_2, sk(5)) patient(sk(1), sk(5)) % Somebody (pro) derives programming language, sk(1), from ’APL’ event(sk(4)) present(real, sk(4)) action(derive_v_3, sk(4)) patient(sk(1), sk(4)) agent(pro, sk(4)) mod(from, ’APL’, sk(4)) www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 38. 38 Drapet på Kennedy Presidenten ble myrdet av en snikskytter 22. november 1963 under et besøk i Dallas, Texas. Kennedy-mordet ble umiddelbart gjenstand for en rekke ulike teorier og påstander som har fortsatt å oppta allmennheten etterpå. [. . . ] [The Kennedy Assassination. The President was assassinated November 22, 1963 during a visit to Dallas, Texas. The Kennedy assassination immediately became subject to a range of different theories and allegations that subsequently have continued to occupy the general public. [. . . ]] www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 39. 39 Acquired Knowledge isa(visit_n_5, real, sk(3)) isa(dallas_n_1, real, sk(4)) isa(president_n_3, real, sk(1)) % There is a visit, sk(3), in Dallas, sk(4). isa(sniper_n_1, real, sk(2)) event(sk(5)) agent(sk(3), sk(5)) % The President, sk(1), was murdered by a present(real, sk(5)) % sniper, sk(2), during the visit, sk(3), action(be_v_4, sk(5)) % on November 22, 1963. mod(in, sk(4), sk(5)) event(sk(8)) past(real, sk(8)) isa(texas_n_1, real, sk(6)) action(murder_v_1, sk(8)) patient(sk(1), sk(8)) % The visit, sk(3), is in Texas, sk(6). agent(sk(2), sk(8)) event(sk(7)) mod(under, sk(3), sk(8)) agent(sk(3), sk(7)) mod(nil, date(1963, 11, 22), sk(8)) present(real, sk(7)) action(be_v_4, sk(7)) mod(in, sk(6), sk(7)) www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 40. 40 Question Answering «Hvem myrdet presidenten?» (“Who murdered the president?”) which(sk(2)) isa(somebody_n_1, real, sk(2)) <= isa(person_n_1, real, sk(2)) <= isa(expert_n_1, real, sk(2)) <= which(A), isa(sniper_n_1, real, sk(2)) % A = sk(2). isa(somebody_n_1, real, A), isa(president_n_3, real, sk(1)) % B = sk(1). isa(president_n_3, real, B), agent(sk(2), sk(8)) % C = sk(8). agent(A, C), action(murder_v_1, sk(8)) action(murder_v_1, C), past(real, sk(8)) past(real, C), event(sk(8)) event(C), patient(sk(1), sk(8)) patient(B, C) => sk(2) www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 41. 41 References I Amble, Tore. 2000. BusTUC - a natural language bus route oracle. In Proceedings of the 6th Applied Natural Language Processing Conference, 1–6. Seattle, Washington, USA: Association for Computational Linguistics. ———. 2003. The understanding computer: Natural language understanding in practice. Trondheim, Norway: Department of Computer and Information Science, Norwegian University of Science and Technology. Preliminary version. Bruland, Tore. 2002. ExamTUC - a simple examination system in natural language. Sivilingeniør’s thesis, Department of Computer and Information Science, Norwegian University of Science and Technology, Trondheim, Norway. www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 42. 42 References II Converse, Tim, Ronald M. Kaplan, Barney Pell, Scott Prevost, Lorenzo Thione, and Chad Walters. 2008. Powerset’s natural language Wikipedia search engine. In Wikipedia and Artificial Intelligence: An Evolving Synergy: Papers from the AAAI Workshop, 67. Techical Report WS-08-15, Menlo Park, California, USA: The AAAI Press. Dyvik, Helge. 2004. Translations as semantic mirrors. From parallel corpus to WordNet. Language and Computers 49(1):311–326. Fellbaum, Christiane, ed. 1998. WordNet: An electronic lexical database. Language, Speech, and Communication, Cambridge, Massachusetts, USA: The MIT Press. www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 43. 43 References III Haslerud, Vibecke C. D., and Petter Henriksen, eds. 2003. Engelsk stor ordbok: Engelsk-norsk / norsk-engelsk. Oslo, Norway: Kunnskapsforlaget. Henriksen, Petter, ed. 2003. Aschehougs og Gyldendals Store Norske Leksikon. Oslo, Norway: Kunnskapsforlaget. Johannessen, Janne Bondi. 2001. Sammensatte ord. Norsk Lingvistisk Tidsskrift 19:59–91. Johannessen, Janne Bondi, and Helge Hauglin. 1996. An automatic analysis of Norwegian compounds. In Papers from the 16th Scandinavian Conference of Linguistics. Turku/Åbo, Finland. www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 44. 44 References IV Joseph, John Earl. 2004. Language and Identity: National, Ethnic, Religious. Hampshire, UK: Palgrave Macmillan. Lambert, Wallace E. 1972. Language, psychology, and culture. Language Science and National Development Series, Stanford, California, USA: Stanford University Press. Miller, George A. 1995. WordNet: a lexical database for English. Communications of the ACM 38(11):39–41. Nordgård, Torbjørn. 1998. Norwegian computational lexicon (NorKompLeks). In Proceedings of the 11th Nordic Conference on Computational Linguistics, ed. Bente Maegaard, 34–44. University of Copenhagen, Denmark: Center for Sprogteknologi. www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 45. 45 References V Norwegian Ministry of Culture and Church Affairs. 2008. Mål og meining: Ein heilskapleg norsk språkpolitikk. Stortingsmelding nr. 35 (2007–2008), Norwegian Ministry of Culture and Church Affairs, Oslo, Norway. Nygaard, Lars. 2006. Frå ordbok til ordnett. Cand.philol.-oppgåve, Universitetet i Oslo, Norway. Sætre, Rune. 2006. GeneTUC: Natural Language Understanding in Medical Text. Ph.D. thesis, Norwegian University of Science and Technology, Trondheim, Norway. www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 46. Appendix www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 47. 47 Other Non-English Languages — Compounding in, for example, German: • “usability”: der Gebrauch (usage) + die Tauglichkeit (capability) → die Gebrauch-s-tauglichkeit • “horse paddock”: das Pferd + die Koppel → die Pferd-e-Koppel www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 48. 48 Research Questions and Answers Q1 Is it possible to develop a method for automatically building a broad-domain, general semantic resource for Norwegian that is compatible with existing freely available, widely used, English semantic resources? A1 Yes. Verto, the implementation of the novel method and algorithms presented in my thesis, was successfully used to automatically create Ordnett, a broad-domain, general lexical-semantic resource that maps Norwegian words and phrases to concepts in the Princeton WordNet. www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 49. 49 Research Questions and Answers Q2 What restrictions apply to the method? What resources are prerequisite? A2 The method described herein requires (1) a simple bilingual dictionary that contains information about the lexical category of entries and maps words in the source language to words in the target language; and (2) a lexical database that contains semantic information about synonymy, hypernymy, and similarity between senses in the target language. To handle compound words, the method also requires the availability of an automatic compound analyzer. www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 50. 50 Research Questions and Answers Q3 For each of the lexical categories nouns, verbs, adjectives, and adverbs, how large fraction of the words in each class does the method work for? A3 With the configuration of the mapping framework that provided the maximum average precision value of 0.921 (achieved in Test Run 4), the number of successfully mapped words per lexical category were noun 22,888 (50.3 %); adjective 4,387 (47.7 %); verb 3,495 (65.2 %); and adverb 186 (31.9 %). With the configuration that provided the maximum recall value of 0.365 (achieved in Test Run 12), the number of successfully mapped words per lexical category were noun 25,879 (56.3 %); adjective 5,357 (56.2 %); verb 4,264 (78.5 %); and adverb 492 (35.3 %). The same configuration also provided the maximum F0.5 -score of 0.680. www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 51. 51 Research Questions and Answers Q4 How well does the method avoid mapping words in the source language to senses in the target language that carry meanings that are not covered by the source words? A4 Comparison of the results produced by the system presented herein with those produced by a human expert showed that with a configuration that maximizes the precision value, the system reaches a precision score of 0.921. This shows that the system is able to rather precisely avoid unwarranted mappings. www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 52. 52 Research Questions and Answers Q5 Can the method handle (single word) compounds that occur frequently in Norwegian, but more seldom in English? A5 Yes. The framework that implements the method presented herein handles mapping of compounds by applying an integrated compound-word analyzer. www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 53. 53 Research Questions and Answers Q6 How may the resulting semantic resource be useful for different areas of NLP research and development? A6 One of the most important benefits of the method is that it produces a semantic resource that is linked to the Princeton WordNet, and thereby can benefit from other semantic resources that extends or complements—but are still compatible with—WordNet. [. . . ] It should be possible to utilize Ordnett for Norwegian (versions of) applications within most areas of NLP where ontologies are already used. These areas include, but is not necessarily limited to, Word-Sense Disambiguation (WSD), Text Summarization, Document Clustering, Information Extraction (IE), IR, Text Interpretation, and Natural Language Generation (NLG). www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian
  • 54. 54 Research Questions and Answers Q7 Can the method be applied to other languages, and if so, to which ones? A7 I see no reason why the method should not be applicable to at least other Germanic languages, like the Scandinavian languages, German, and Dutch. However, if the language makes extensive use of compounding, an automatic compound-word analyzer may be required. www.ntnu.no Martin Thorsen Ranang, Open-Domain Word-Level Interpretation of Norwegian