SlideShare una empresa de Scribd logo
1 de 18
Human Language Technologies for Ethiopian
Languages: Challenges and Future Directions


         Solomon Teferra Abate, Binyam Ephrem,
 Enchalew Yifru, Kassa Tilahun, Lemlem Hagos, Mohammed-
              hussen Abubeker and Taye Girma


           LIG, Université Joseph Fourier (UJF)
         ITPhD Program, Addis Ababa University
              solomon_teferra_7@yahoo.com


                  AGIS'11 Conference, Addis Ababa
Outline


●   Ethiopian Languages
●   Human Language Technology (HLT)
      –   Role in Development
      –   HLT in the World
●   HLT for Ethiopian Languages
      –   Language and Technology Coverage
      –   Challenges and limitations
      –   Future Directions and Strategies

                        AGIS'11 Conference, Addis Ababa
Ethiopian Languages


●   There are about 90 languages
●   Most belong to the Afro-Asiatic language family
●   Amharic, Afan-Oromo and Tigringa are the 3 most spoken
●   Amharic is federal working language
      –   Regions have their own working language
      –   The language policy states that everyone has the right to in
           his/her mother tongue
      –   More than 20 languages are MOI in primary (I&II) school
                        AGIS'11 Conference, Addis Ababa
Human Language Technology

●   Is an interdisciplinary field that encompasses most sub-
    disciplines of linguistics, Computational Linguistics, Natural
    Language Processing, computer science, Artificial Intelligence,
    psychology, philosophy, mathematics and statistics
                                  ✔   Morphological analysis/synthesis,
                   ✔   Stemming
Covers ASR,✔
                                  ✔   Information Extraction,
areas              ✔   MT,
       TTS,✔
                                  ✔   Text/document categorization
like:  OCR,
                   ✔   POS tagging,
                                      Spelling and Grammar checking,
           ✔
                                  ✔
                   ✔   Parsing,
                                  ✔   etc.
                        AGIS'11 Conference, Addis Ababa
Human Language Technology - Role

●   Enables ICT products to have knowledge of human language
      ●   Increases the acceptance of the technology and the
            productivity of its users in the information age
●   Helps people collaborate, conduct business, share knowledge
    and participate in social and political debates regardless of
    language barriers or computer skills
●   Relevant for the disadvantaged to have access to information:
      ✔ the illiterate,    ✔ the physically impaired population


      ✔   the rural poor,

                        AGIS'11 Conference, Addis Ababa
HLT in the World

●   Well developed for a few languages of the world like English
●   IBM Watson Computer
    ●       Passed its first test winning a QA competition with $1 M value
    ●       The goal of its design is to have intelligent computer that can
            interact in a natural language
               ✔   Understanding any question asked in a natural speech
               ✔   Answer questions as humans do
        ●    Uses a number of HLT modules such as: ASR, QA, TTS
        ✗    Requires a lot of expensive servers (about a total of $1 billion)
                                AGIS'11 Conference, Addis Ababa
HLT in the World

●   Siri is a simple iphone based system that:
      ●   Receives commands in a natural speech
             ●   Send message
             ●   Schedule meetings
             ●   Place phone calls
●   Siri has been claimed to:
      ●   understand what you say
      ●   know what you mean
      ●   speak back in a natural speech
                           AGIS'11 Conference, Addis Ababa
HLT in the World: Europe

●   Europe is a continent that is united to one multilingual
    economic country with 23 official languages
●   To enable the European languages, the European Union:
      ✔   Invested over €130 M to promote language technologies
            and language resource infrastructures in 2009-2011
      ✔   Allocated €35 M for SME action on Digital Content and
           Languages and €50 M for Language Technologies in its
           Work Program 2011-2012
      ✔   Proposed a simple platform that enables availability of any
            online content and services in all European languages
                        AGIS'11 Conference, Addis Ababa
HLT in the World: South Africa

●   South African government has identified HLT as a priority area
    to enable (technologically) its 11 official languages
➢   Various R&D projects and initiatives have been funded by
    government through:
      ●   Department of Arts and Culture (DAC),
      ●   Department of Science and Technology (DST), and
      ●   National Research Foundation (NRF)
●   The key challenge is fragmentation of R&D activities in HLT
      ●   Addressed by the South African HLT Audit (SAHLTA)
                         AGIS'11 Conference, Addis Ababa
HLT for Ethiopian Languages


●   Research on HLT for Ethiopian languages started in the 1990s
✔   There are now a lot of (>200) encouraging and valuable works
    on:                                ➢ Thesaurus contraction,
    ➢   ASR,              ➢   Stemming,
                                                ➢   Text classification
    ➢   MT                ➢   Parsing,
                                                ➢   Text categorization,
    ➢   Text-to-speech,   ➢   POS tagging,
                                                ➢   Morphological analysis,
    ➢   OCR,              ➢   Spell checking,
                                                ➢   Information Extraction
✗   Most of them are based on LRs developed for the experiment
                          AGIS'11 Conference, Addis Ababa
HLT for Ethiopian Languages

✗   HLT research covers a limited number of Ethiopian languages
                                            HLT for Ethiopian Languages (Masters theses)
                             25
                                                                                               NLP
                                                                                               Speech Processing
                                                                                               OCR
                             20                                                                CSE
            Research Areas




                             15




                             10




                              5




                              0
                                  Amharic      Afan Oromo    Tigringa        Welayta   Ge'ez            Sidama

                                                                 Languages




                                              AGIS'11 Conference, Addis Ababa
Challenges and Limitations

●   Challenges that hinder Ethiopian HLT include:
      –   lack of language resources: speech and text corpora
      –   Lack of standardized evaluation corpora and platform
      –   lack of expertise on both language and technology
      –   time shortage
           ●   done only for academic achievement in the given time
      –   absence of national HLT research plan - HLT road-map
           ● based only on individuals' interest
      –   lack of sustainable and coordinated research fund
                          AGIS'11 Conference, Addis Ababa
Challenges and Limitations

➔   They have limitations:
     –   use of insufficient and low quality language resource
          ➢   research results are not conclusive
     –   research results are not well evaluated, analyzed and
           documented
          ➢  Their achievements and gaps are vague
     –   research attempts in HLT are fragmented
          ➢   lack of integration, consolidation and continuity
               ●   Tokenizer    POS     Parser      LA       ASR/MT
                           AGIS'11 Conference, Addis Ababa
Future Directions and Strategies


●   Is there any other way to escape the cost of the language barrier
    or to cover it with out HLT in the information age? NO!!!
●   Are we rich enough to continue spending for only academic
    exercises? NO!!!
      –   6 months of at least 10 research students doing their thesis on
            any one of HLT areas every year and their supervisors
      –   3 years of at least 6 PhD research students (admitted every year)
            and their research supervisors
      –   The time of academic researchers doing research for publication
           purpose (for academic promotion)
                           AGIS'11 Conference, Addis Ababa
Future Directions and Strategies

●   Give emphasis and recognition to R&D activities in HLT
●   Develop national HLT road-map (HLT Audit)
      –   Shows research priorities
      –   Avoids duplication (even across languages)
      –   Reduces R&D cost
      –   Provides a means of evaluation/assessment
      –   Enforces consolidation, integration and continuity
      –   Inspires researchers and developers
      –   Shows the benefit areas for the HLT industry
                        AGIS'11 Conference, Addis Ababa
Future Directions and Strategies


●   Establish Institutional/National R&D units
      –   Fund, coordinate and evaluate R&D projects
      –   Store, maintain, distribute language resources and R&D
            outputs
      –   Promote the utility of R&D outputs
      –   Coordinate and support private industries
      –   Coordinate the cooperation of the academia and the industry
      –   Promote/attract international investments on HLT industries


                        AGIS'11 Conference, Addis Ababa
Conclusion


●   We have 85 living languages
●   All have speakers who need information and the right
    to get it in a language and the way they understand
              –   HLT is the way to realize it
●   We need to have a strategy to put it in place
      –       Cooperation across:
          ●    Time: past->present->future   ●   Language,
          ●    Research area,                ●   Sector: academic<->industry

                            AGIS'11 Conference, Addis Ababa
We can
           make it
             BY




AGIS'11 Conference, Addis Ababa

Más contenido relacionado

Similar a Human Language Technologies for Ethiopian Languages: Challenges and Future Directions

Policies for OER in regional and minority languages: are regional and minorit...
Policies for OER in regional and minority languages: are regional and minorit...Policies for OER in regional and minority languages: are regional and minorit...
Policies for OER in regional and minority languages: are regional and minorit...LangOER
 
Building Capacities in Human Language Technology for African Languages
Building Capacities in Human Language Technology for African LanguagesBuilding Capacities in Human Language Technology for African Languages
Building Capacities in Human Language Technology for African LanguagesGuy De Pauw
 
Role of Language Engineering to Preserve Endangered Language
Role of Language Engineering to Preserve Endangered Language Role of Language Engineering to Preserve Endangered Language
Role of Language Engineering to Preserve Endangered Language Dr. Amit Kumar Jha
 
NTM%20Project%20-Final%20Presentation%20Revised(2)
NTM%20Project%20-Final%20Presentation%20Revised(2)NTM%20Project%20-Final%20Presentation%20Revised(2)
NTM%20Project%20-Final%20Presentation%20Revised(2)finance14
 
LangOER Conference: Welcome message
LangOER Conference: Welcome message LangOER Conference: Welcome message
LangOER Conference: Welcome message LangOER
 
Human Language Technologies in a Multilingual Europe
Human Language Technologies in a Multilingual EuropeHuman Language Technologies in a Multilingual Europe
Human Language Technologies in a Multilingual EuropeGeorg Rehm
 
ICANN 51: IDN Root Zone LGR (workshop)
ICANN 51: IDN Root Zone LGR (workshop)ICANN 51: IDN Root Zone LGR (workshop)
ICANN 51: IDN Root Zone LGR (workshop)ICANN
 
Natural language processing (NLP)
Natural language processing (NLP) Natural language processing (NLP)
Natural language processing (NLP) ASWINKP11
 
Celtic language technologies in the digital age
Celtic language technologies in the digital ageCeltic language technologies in the digital age
Celtic language technologies in the digital agetechiaith
 
International Sign as a Conference Language
International Sign as a Conference LanguageInternational Sign as a Conference Language
International Sign as a Conference LanguageMobileDeaf
 
Bridging language acquision and language policy
Bridging language acquision and language policyBridging language acquision and language policy
Bridging language acquision and language policyLangOER
 
Reflections on building a Multi-country AAC Implementation Guide.pptx
Reflections on building a Multi-country AAC Implementation Guide.pptxReflections on building a Multi-country AAC Implementation Guide.pptx
Reflections on building a Multi-country AAC Implementation Guide.pptxE.A. Draffan
 
Sustainability in OER for less used languages
Sustainability in OER for less used languagesSustainability in OER for less used languages
Sustainability in OER for less used languagesLangOER
 
Natural language processing for Albanian: a state-of-the-art survey
Natural language processing for Albanian: a state-of-the-art  surveyNatural language processing for Albanian: a state-of-the-art  survey
Natural language processing for Albanian: a state-of-the-art surveyIJECEIAES
 
OER: insights into a multilingual landscape
OER: insights into a multilingual landscapeOER: insights into a multilingual landscape
OER: insights into a multilingual landscapeLangOER
 
Applied linguístics 1
Applied linguístics 1Applied linguístics 1
Applied linguístics 1Carlos Mayora
 
Huy & robert a call to call
Huy & robert a call to callHuy & robert a call to call
Huy & robert a call to callPhung Huy
 

Similar a Human Language Technologies for Ethiopian Languages: Challenges and Future Directions (20)

Policies for OER in regional and minority languages: are regional and minorit...
Policies for OER in regional and minority languages: are regional and minorit...Policies for OER in regional and minority languages: are regional and minorit...
Policies for OER in regional and minority languages: are regional and minorit...
 
Building Capacities in Human Language Technology for African Languages
Building Capacities in Human Language Technology for African LanguagesBuilding Capacities in Human Language Technology for African Languages
Building Capacities in Human Language Technology for African Languages
 
Role of Language Engineering to Preserve Endangered Language
Role of Language Engineering to Preserve Endangered Language Role of Language Engineering to Preserve Endangered Language
Role of Language Engineering to Preserve Endangered Language
 
NTM%20Project%20-Final%20Presentation%20Revised(2)
NTM%20Project%20-Final%20Presentation%20Revised(2)NTM%20Project%20-Final%20Presentation%20Revised(2)
NTM%20Project%20-Final%20Presentation%20Revised(2)
 
LangOER Conference: Welcome message
LangOER Conference: Welcome message LangOER Conference: Welcome message
LangOER Conference: Welcome message
 
Human Language Technologies in a Multilingual Europe
Human Language Technologies in a Multilingual EuropeHuman Language Technologies in a Multilingual Europe
Human Language Technologies in a Multilingual Europe
 
ICANN 51: IDN Root Zone LGR (workshop)
ICANN 51: IDN Root Zone LGR (workshop)ICANN 51: IDN Root Zone LGR (workshop)
ICANN 51: IDN Root Zone LGR (workshop)
 
Natural language processing (NLP)
Natural language processing (NLP) Natural language processing (NLP)
Natural language processing (NLP)
 
Celtic language technologies in the digital age
Celtic language technologies in the digital ageCeltic language technologies in the digital age
Celtic language technologies in the digital age
 
Speech-Recognition.pptx
Speech-Recognition.pptxSpeech-Recognition.pptx
Speech-Recognition.pptx
 
Achievement And Lessons Learned By An Loc
Achievement And Lessons Learned By An LocAchievement And Lessons Learned By An Loc
Achievement And Lessons Learned By An Loc
 
International Sign as a Conference Language
International Sign as a Conference LanguageInternational Sign as a Conference Language
International Sign as a Conference Language
 
Bridging language acquision and language policy
Bridging language acquision and language policyBridging language acquision and language policy
Bridging language acquision and language policy
 
Reflections on building a Multi-country AAC Implementation Guide.pptx
Reflections on building a Multi-country AAC Implementation Guide.pptxReflections on building a Multi-country AAC Implementation Guide.pptx
Reflections on building a Multi-country AAC Implementation Guide.pptx
 
Sustainability in OER for less used languages
Sustainability in OER for less used languagesSustainability in OER for less used languages
Sustainability in OER for less used languages
 
How can we profit from multilingualism? Good practices in Europe
How can we profit from multilingualism? Good practices in EuropeHow can we profit from multilingualism? Good practices in Europe
How can we profit from multilingualism? Good practices in Europe
 
Natural language processing for Albanian: a state-of-the-art survey
Natural language processing for Albanian: a state-of-the-art  surveyNatural language processing for Albanian: a state-of-the-art  survey
Natural language processing for Albanian: a state-of-the-art survey
 
OER: insights into a multilingual landscape
OER: insights into a multilingual landscapeOER: insights into a multilingual landscape
OER: insights into a multilingual landscape
 
Applied linguístics 1
Applied linguístics 1Applied linguístics 1
Applied linguístics 1
 
Huy & robert a call to call
Huy & robert a call to callHuy & robert a call to call
Huy & robert a call to call
 

Más de Guy De Pauw

Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...Guy De Pauw
 
Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...Guy De Pauw
 
Resource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech TaggingResource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech TaggingGuy De Pauw
 
Natural Language Processing for Amazigh Language
Natural Language Processing for Amazigh LanguageNatural Language Processing for Amazigh Language
Natural Language Processing for Amazigh LanguageGuy De Pauw
 
POS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik LanguagePOS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik LanguageGuy De Pauw
 
The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)Guy De Pauw
 
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...Guy De Pauw
 
Tagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News CorpusTagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News CorpusGuy De Pauw
 
A Corpus of Santome
A Corpus of SantomeA Corpus of Santome
A Corpus of SantomeGuy De Pauw
 
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...Guy De Pauw
 
Compiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFSTCompiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFSTGuy De Pauw
 
The Database of Modern Icelandic Inflection
The Database of Modern Icelandic InflectionThe Database of Modern Icelandic Inflection
The Database of Modern Icelandic InflectionGuy De Pauw
 
Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic ProgrammingLearning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic ProgrammingGuy De Pauw
 
Issues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken IrishIssues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken IrishGuy De Pauw
 
How to build language technology resources for the next 100 years
How to build language technology resources for the next 100 yearsHow to build language technology resources for the next 100 years
How to build language technology resources for the next 100 yearsGuy De Pauw
 
Towards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersTowards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersGuy De Pauw
 
The PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource DevelopmentThe PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource DevelopmentGuy De Pauw
 
A System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá CharactersA System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá CharactersGuy De Pauw
 
IFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation SystemIFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation SystemGuy De Pauw
 
A Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription SystemA Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription SystemGuy De Pauw
 

Más de Guy De Pauw (20)

Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...
 
Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...
 
Resource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech TaggingResource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech Tagging
 
Natural Language Processing for Amazigh Language
Natural Language Processing for Amazigh LanguageNatural Language Processing for Amazigh Language
Natural Language Processing for Amazigh Language
 
POS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik LanguagePOS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik Language
 
The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)
 
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
 
Tagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News CorpusTagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News Corpus
 
A Corpus of Santome
A Corpus of SantomeA Corpus of Santome
A Corpus of Santome
 
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
 
Compiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFSTCompiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFST
 
The Database of Modern Icelandic Inflection
The Database of Modern Icelandic InflectionThe Database of Modern Icelandic Inflection
The Database of Modern Icelandic Inflection
 
Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic ProgrammingLearning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
 
Issues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken IrishIssues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken Irish
 
How to build language technology resources for the next 100 years
How to build language technology resources for the next 100 yearsHow to build language technology resources for the next 100 years
How to build language technology resources for the next 100 years
 
Towards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersTowards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound Analysers
 
The PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource DevelopmentThe PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource Development
 
A System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá CharactersA System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá Characters
 
IFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation SystemIFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation System
 
A Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription SystemA Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription System
 

Último

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 

Último (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 

Human Language Technologies for Ethiopian Languages: Challenges and Future Directions

  • 1. Human Language Technologies for Ethiopian Languages: Challenges and Future Directions Solomon Teferra Abate, Binyam Ephrem, Enchalew Yifru, Kassa Tilahun, Lemlem Hagos, Mohammed- hussen Abubeker and Taye Girma LIG, Université Joseph Fourier (UJF) ITPhD Program, Addis Ababa University solomon_teferra_7@yahoo.com AGIS'11 Conference, Addis Ababa
  • 2. Outline ● Ethiopian Languages ● Human Language Technology (HLT) – Role in Development – HLT in the World ● HLT for Ethiopian Languages – Language and Technology Coverage – Challenges and limitations – Future Directions and Strategies AGIS'11 Conference, Addis Ababa
  • 3. Ethiopian Languages ● There are about 90 languages ● Most belong to the Afro-Asiatic language family ● Amharic, Afan-Oromo and Tigringa are the 3 most spoken ● Amharic is federal working language – Regions have their own working language – The language policy states that everyone has the right to in his/her mother tongue – More than 20 languages are MOI in primary (I&II) school AGIS'11 Conference, Addis Ababa
  • 4. Human Language Technology ● Is an interdisciplinary field that encompasses most sub- disciplines of linguistics, Computational Linguistics, Natural Language Processing, computer science, Artificial Intelligence, psychology, philosophy, mathematics and statistics ✔ Morphological analysis/synthesis, ✔ Stemming Covers ASR,✔ ✔ Information Extraction, areas ✔ MT, TTS,✔ ✔ Text/document categorization like: OCR, ✔ POS tagging, Spelling and Grammar checking, ✔ ✔ ✔ Parsing, ✔ etc. AGIS'11 Conference, Addis Ababa
  • 5. Human Language Technology - Role ● Enables ICT products to have knowledge of human language ● Increases the acceptance of the technology and the productivity of its users in the information age ● Helps people collaborate, conduct business, share knowledge and participate in social and political debates regardless of language barriers or computer skills ● Relevant for the disadvantaged to have access to information: ✔ the illiterate, ✔ the physically impaired population ✔ the rural poor, AGIS'11 Conference, Addis Ababa
  • 6. HLT in the World ● Well developed for a few languages of the world like English ● IBM Watson Computer ● Passed its first test winning a QA competition with $1 M value ● The goal of its design is to have intelligent computer that can interact in a natural language ✔ Understanding any question asked in a natural speech ✔ Answer questions as humans do ● Uses a number of HLT modules such as: ASR, QA, TTS ✗ Requires a lot of expensive servers (about a total of $1 billion) AGIS'11 Conference, Addis Ababa
  • 7. HLT in the World ● Siri is a simple iphone based system that: ● Receives commands in a natural speech ● Send message ● Schedule meetings ● Place phone calls ● Siri has been claimed to: ● understand what you say ● know what you mean ● speak back in a natural speech AGIS'11 Conference, Addis Ababa
  • 8. HLT in the World: Europe ● Europe is a continent that is united to one multilingual economic country with 23 official languages ● To enable the European languages, the European Union: ✔ Invested over €130 M to promote language technologies and language resource infrastructures in 2009-2011 ✔ Allocated €35 M for SME action on Digital Content and Languages and €50 M for Language Technologies in its Work Program 2011-2012 ✔ Proposed a simple platform that enables availability of any online content and services in all European languages AGIS'11 Conference, Addis Ababa
  • 9. HLT in the World: South Africa ● South African government has identified HLT as a priority area to enable (technologically) its 11 official languages ➢ Various R&D projects and initiatives have been funded by government through: ● Department of Arts and Culture (DAC), ● Department of Science and Technology (DST), and ● National Research Foundation (NRF) ● The key challenge is fragmentation of R&D activities in HLT ● Addressed by the South African HLT Audit (SAHLTA) AGIS'11 Conference, Addis Ababa
  • 10. HLT for Ethiopian Languages ● Research on HLT for Ethiopian languages started in the 1990s ✔ There are now a lot of (>200) encouraging and valuable works on: ➢ Thesaurus contraction, ➢ ASR, ➢ Stemming, ➢ Text classification ➢ MT ➢ Parsing, ➢ Text categorization, ➢ Text-to-speech, ➢ POS tagging, ➢ Morphological analysis, ➢ OCR, ➢ Spell checking, ➢ Information Extraction ✗ Most of them are based on LRs developed for the experiment AGIS'11 Conference, Addis Ababa
  • 11. HLT for Ethiopian Languages ✗ HLT research covers a limited number of Ethiopian languages HLT for Ethiopian Languages (Masters theses) 25 NLP Speech Processing OCR 20 CSE Research Areas 15 10 5 0 Amharic Afan Oromo Tigringa Welayta Ge'ez Sidama Languages AGIS'11 Conference, Addis Ababa
  • 12. Challenges and Limitations ● Challenges that hinder Ethiopian HLT include: – lack of language resources: speech and text corpora – Lack of standardized evaluation corpora and platform – lack of expertise on both language and technology – time shortage ● done only for academic achievement in the given time – absence of national HLT research plan - HLT road-map ● based only on individuals' interest – lack of sustainable and coordinated research fund AGIS'11 Conference, Addis Ababa
  • 13. Challenges and Limitations ➔ They have limitations: – use of insufficient and low quality language resource ➢ research results are not conclusive – research results are not well evaluated, analyzed and documented ➢ Their achievements and gaps are vague – research attempts in HLT are fragmented ➢ lack of integration, consolidation and continuity ● Tokenizer POS Parser LA ASR/MT AGIS'11 Conference, Addis Ababa
  • 14. Future Directions and Strategies ● Is there any other way to escape the cost of the language barrier or to cover it with out HLT in the information age? NO!!! ● Are we rich enough to continue spending for only academic exercises? NO!!! – 6 months of at least 10 research students doing their thesis on any one of HLT areas every year and their supervisors – 3 years of at least 6 PhD research students (admitted every year) and their research supervisors – The time of academic researchers doing research for publication purpose (for academic promotion) AGIS'11 Conference, Addis Ababa
  • 15. Future Directions and Strategies ● Give emphasis and recognition to R&D activities in HLT ● Develop national HLT road-map (HLT Audit) – Shows research priorities – Avoids duplication (even across languages) – Reduces R&D cost – Provides a means of evaluation/assessment – Enforces consolidation, integration and continuity – Inspires researchers and developers – Shows the benefit areas for the HLT industry AGIS'11 Conference, Addis Ababa
  • 16. Future Directions and Strategies ● Establish Institutional/National R&D units – Fund, coordinate and evaluate R&D projects – Store, maintain, distribute language resources and R&D outputs – Promote the utility of R&D outputs – Coordinate and support private industries – Coordinate the cooperation of the academia and the industry – Promote/attract international investments on HLT industries AGIS'11 Conference, Addis Ababa
  • 17. Conclusion ● We have 85 living languages ● All have speakers who need information and the right to get it in a language and the way they understand – HLT is the way to realize it ● We need to have a strategy to put it in place – Cooperation across: ● Time: past->present->future ● Language, ● Research area, ● Sector: academic<->industry AGIS'11 Conference, Addis Ababa
  • 18. We can make it BY AGIS'11 Conference, Addis Ababa