SlideShare a Scribd company logo
1 of 2
Download to read offline
Machine Translation course program

Brief description of the course:
There are two fundamental approaches to machine translation: rule-based approach (based on formal
models of natural languages, like e.g. dependency grammars) and statistical approaches (based on parallel
streams of data). Both these approaches have their advantages: rule-based one being formal and
structured, while statistic approach gives an opportunity to construct and scale the system without the
need to deeply study properties of a natural language. On the other hand both these approaches have their
problematic areas: rule-based approach is bound to a given language or a family of languages, while
statistic approach doesn’t allow controlling subtle structures and properties of a natural language, like for
example generating prepositions. Recently combining these two fundamental approaches have been of a
special interest of scientists. An entire pipeline of machine translation, starting from source language
formalization and finishing with word reordering on the target language side, can be considered as a
training area for combining rule based with statistics. This course will introduce students into all sub-tasks
of creating a machine translation system using both fundamental approaches: formalization of natural
language, translational dictionaries, phrase translation, machine translation models, decoding and word
reordering. The course will also present formal semantic models of natural languages and their place in the
topic. Along with that, machine learning methods (like structured prediction) will be in the focus of the
course. The course material assumes knowledge of general higher mathematics and knowledge or interest
in the natural language processing. We will have some hands-on and take-away knowledge sessions, which
assume familiarity with formats, NLP algorithms and libraries.


Course topics
    1.    Introduction to MT. Motivation of its existence
    2.    Short history of MT, mane phases. ALPAC report
    3.    MT systems triangle. Direct and indirect MT. Examples of MT systems
    4.    Current MT systems existing in the industry, main players
    5.    Existing software packages for natural language processing and building an MT system
    6.    Two fundamental approaches to MT: statistical and rule-based (classical)
    7.    Methods of MT
    8.    Direct MT system, its features, pros and cons.
    9.    Transfer MT system, types of transfer methods, features
    10.   Notion of interlingua. Features of MT based on interlingua, its comparison with transfer
    11.   Statistical MT and its components
    12.   Example based MT systems
    13.   Theory of statistical MT systems. Fundamental equation (Bayes theorem). Notion of statistical language
          model. MT model
    14.   model of machine translation in statistical MT
    15.   Task of word alignment
    16.   Features of MT systems
    17.   Existing programming components of statistical MT systems
    18.   Evaluation of MT systems: human evaluation and automatic metrics
    19.   BLEU score
    20.   METEOR score
    21.   NIST score
    22.   Round-trip evaluation method
    23.   Hybrid MT systems
    24.   Task of word reordering in a sentence on the target side. Rule-based and statistical approaches
    25.   Computer semantics of a natural language. MT system based on it
    26.   Pragmatics and context analysis on cross-sentence level
    27.   Practical details of software packages: GIZA++, SRILM, Moses
28. Method of structured prediction for learning machine translation models


Seminar topics
   1.   Mathematics of statistical MT, paper [1]
   2.   Hierarchical model of statistical MT, paper [2]
   3.   Phrase-based statistical MT, paper [3]
   4.   Rule-based MT systems, papers [4,5]
   5.   Hybrid MT systems, based on examples, paper [6]
   6.   BLEU score in details, paper [8]
   7.   Robust large-scale MT systems, based on examples, paper [9]


Bibliography
[1] Brown P., Della Petra S., Della Petra V., Mercer R.: The Mathematics of
Statistical Machine Translation: Parameter Estimation, 1993
[2] Chiang D.: A Hierarchical Phrase-Based Model for Statistical Machine
Translation, 2005
[3] Koehn P., Och F., Marcu D.: Statistical Phrase-Based Machine Translation, 2003
[4] Kaplan R., Netter K., Wedekind J., Zaenen A.: Translation By Structural
Correspondences, 1989
[5] Landsbergen J.: The Rosetta Project, 1989
[6] Groves D., Way A.: Hybrid Example-Based SMT: the Best of Both Worlds?
[7] Athanaselis T., Bakamidis S., Dologou I.: Words Reordering based on Statistical
Language Model, 2006
[8] Papineni K., Roukos S., Ward T., Zhu W.-J.: BLEU: a Method for Automatic
Evaluation of Machine Translation, 2002
[9] Gough N., Way A.: Robust Large-Scale EBMT with Marker-Based Segmentation,
2004

More Related Content

What's hot

Algoritmos comp2010
Algoritmos comp2010Algoritmos comp2010
Algoritmos comp2010manu051063
 
AUTOMATED SHORT ANSWER GRADER USING FRIENDSHIP GRAPHS
AUTOMATED SHORT ANSWER GRADER USING FRIENDSHIP GRAPHSAUTOMATED SHORT ANSWER GRADER USING FRIENDSHIP GRAPHS
AUTOMATED SHORT ANSWER GRADER USING FRIENDSHIP GRAPHScsandit
 
A hierarchical approach for semi structured document indexing and
A hierarchical approach for semi structured document indexing andA hierarchical approach for semi structured document indexing and
A hierarchical approach for semi structured document indexing andIbrahim Bounhas
 
Filtering out improper user accounts from twitter user accounts for discoveri...
Filtering out improper user accounts from twitter user accounts for discoveri...Filtering out improper user accounts from twitter user accounts for discoveri...
Filtering out improper user accounts from twitter user accounts for discoveri...siramatu-lab
 
HAN_XU_ICDMW2014
HAN_XU_ICDMW2014HAN_XU_ICDMW2014
HAN_XU_ICDMW2014Han Xu, PhD
 
Resume-Luan Sitao
Resume-Luan SitaoResume-Luan Sitao
Resume-Luan SitaoSitao Luan
 
Tag recommendation in social bookmarking sites like deli
Tag recommendation in social bookmarking sites like deliTag recommendation in social bookmarking sites like deli
Tag recommendation in social bookmarking sites like deliVinay Singri
 
Tag recommendation in social bookmarking sites like deli
Tag recommendation in social bookmarking sites like deliTag recommendation in social bookmarking sites like deli
Tag recommendation in social bookmarking sites like deliVinay Singri
 

What's hot (11)

Lect09
Lect09Lect09
Lect09
 
Algoritmos comp2010
Algoritmos comp2010Algoritmos comp2010
Algoritmos comp2010
 
Lect01
Lect01Lect01
Lect01
 
AUTOMATED SHORT ANSWER GRADER USING FRIENDSHIP GRAPHS
AUTOMATED SHORT ANSWER GRADER USING FRIENDSHIP GRAPHSAUTOMATED SHORT ANSWER GRADER USING FRIENDSHIP GRAPHS
AUTOMATED SHORT ANSWER GRADER USING FRIENDSHIP GRAPHS
 
A hierarchical approach for semi structured document indexing and
A hierarchical approach for semi structured document indexing andA hierarchical approach for semi structured document indexing and
A hierarchical approach for semi structured document indexing and
 
Data wrangling week 9
Data wrangling week 9Data wrangling week 9
Data wrangling week 9
 
Filtering out improper user accounts from twitter user accounts for discoveri...
Filtering out improper user accounts from twitter user accounts for discoveri...Filtering out improper user accounts from twitter user accounts for discoveri...
Filtering out improper user accounts from twitter user accounts for discoveri...
 
HAN_XU_ICDMW2014
HAN_XU_ICDMW2014HAN_XU_ICDMW2014
HAN_XU_ICDMW2014
 
Resume-Luan Sitao
Resume-Luan SitaoResume-Luan Sitao
Resume-Luan Sitao
 
Tag recommendation in social bookmarking sites like deli
Tag recommendation in social bookmarking sites like deliTag recommendation in social bookmarking sites like deli
Tag recommendation in social bookmarking sites like deli
 
Tag recommendation in social bookmarking sites like deli
Tag recommendation in social bookmarking sites like deliTag recommendation in social bookmarking sites like deli
Tag recommendation in social bookmarking sites like deli
 

Viewers also liked

Solr onfitnesse learningfromberlinbuzzwords
Solr onfitnesse learningfromberlinbuzzwordsSolr onfitnesse learningfromberlinbuzzwords
Solr onfitnesse learningfromberlinbuzzwordsDmitry Kan
 
Lucene revolution eu 2013 dublin writeup
Lucene revolution eu 2013 dublin writeupLucene revolution eu 2013 dublin writeup
Lucene revolution eu 2013 dublin writeupDmitry Kan
 
Automatic Build Of Semantic Translational Dictionary
Automatic Build Of Semantic Translational DictionaryAutomatic Build Of Semantic Translational Dictionary
Automatic Build Of Semantic Translational DictionaryDmitry Kan
 
Starget sentiment analyzer for English
Starget sentiment analyzer for EnglishStarget sentiment analyzer for English
Starget sentiment analyzer for EnglishDmitry Kan
 
Social spam detection by SemanticAnalyzer Group
Social spam detection by SemanticAnalyzer GroupSocial spam detection by SemanticAnalyzer Group
Social spam detection by SemanticAnalyzer GroupDmitry Kan
 
Semantic feature machine translation system
Semantic feature machine translation systemSemantic feature machine translation system
Semantic feature machine translation systemDmitry Kan
 
Introduction To Machine Translation 1
Introduction To Machine Translation 1Introduction To Machine Translation 1
Introduction To Machine Translation 1Dmitry Kan
 
Linguistic component Sentiment Analyzer for the Russian language
Linguistic component Sentiment Analyzer for the Russian languageLinguistic component Sentiment Analyzer for the Russian language
Linguistic component Sentiment Analyzer for the Russian languageDmitry Kan
 
Linguistic component Lemmatizer for the Russian language
Linguistic component Lemmatizer for the Russian languageLinguistic component Lemmatizer for the Russian language
Linguistic component Lemmatizer for the Russian languageDmitry Kan
 
MTEngine: Semantic-level Crowdsourced Machine Translation
MTEngine: Semantic-level Crowdsourced Machine TranslationMTEngine: Semantic-level Crowdsourced Machine Translation
MTEngine: Semantic-level Crowdsourced Machine TranslationDmitry Kan
 
Introduction To Machine Translation
Introduction To Machine TranslationIntroduction To Machine Translation
Introduction To Machine TranslationDmitry Kan
 
NoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache HadoopNoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache HadoopDmitry Kan
 
Rule based approach to sentiment analysis at ROMIP 2011
Rule based approach to sentiment analysis at ROMIP 2011Rule based approach to sentiment analysis at ROMIP 2011
Rule based approach to sentiment analysis at ROMIP 2011Dmitry Kan
 
Poster: Method for an automatic generation of a semantic-level contextual tra...
Poster: Method for an automatic generation of a semantic-level contextual tra...Poster: Method for an automatic generation of a semantic-level contextual tra...
Poster: Method for an automatic generation of a semantic-level contextual tra...Dmitry Kan
 
Linguistic component Tokenizer for the Russian language
Linguistic component Tokenizer for the Russian languageLinguistic component Tokenizer for the Russian language
Linguistic component Tokenizer for the Russian languageDmitry Kan
 
Rule based approach to sentiment analysis at romip’11 slides
Rule based approach to sentiment analysis at romip’11 slidesRule based approach to sentiment analysis at romip’11 slides
Rule based approach to sentiment analysis at romip’11 slidesDmitry Kan
 
Semantic Analysis: theory, applications and use cases
Semantic Analysis: theory, applications and use casesSemantic Analysis: theory, applications and use cases
Semantic Analysis: theory, applications and use casesDmitry Kan
 
IR: Open source state
IR: Open source stateIR: Open source state
IR: Open source stateDmitry Kan
 

Viewers also liked (18)

Solr onfitnesse learningfromberlinbuzzwords
Solr onfitnesse learningfromberlinbuzzwordsSolr onfitnesse learningfromberlinbuzzwords
Solr onfitnesse learningfromberlinbuzzwords
 
Lucene revolution eu 2013 dublin writeup
Lucene revolution eu 2013 dublin writeupLucene revolution eu 2013 dublin writeup
Lucene revolution eu 2013 dublin writeup
 
Automatic Build Of Semantic Translational Dictionary
Automatic Build Of Semantic Translational DictionaryAutomatic Build Of Semantic Translational Dictionary
Automatic Build Of Semantic Translational Dictionary
 
Starget sentiment analyzer for English
Starget sentiment analyzer for EnglishStarget sentiment analyzer for English
Starget sentiment analyzer for English
 
Social spam detection by SemanticAnalyzer Group
Social spam detection by SemanticAnalyzer GroupSocial spam detection by SemanticAnalyzer Group
Social spam detection by SemanticAnalyzer Group
 
Semantic feature machine translation system
Semantic feature machine translation systemSemantic feature machine translation system
Semantic feature machine translation system
 
Introduction To Machine Translation 1
Introduction To Machine Translation 1Introduction To Machine Translation 1
Introduction To Machine Translation 1
 
Linguistic component Sentiment Analyzer for the Russian language
Linguistic component Sentiment Analyzer for the Russian languageLinguistic component Sentiment Analyzer for the Russian language
Linguistic component Sentiment Analyzer for the Russian language
 
Linguistic component Lemmatizer for the Russian language
Linguistic component Lemmatizer for the Russian languageLinguistic component Lemmatizer for the Russian language
Linguistic component Lemmatizer for the Russian language
 
MTEngine: Semantic-level Crowdsourced Machine Translation
MTEngine: Semantic-level Crowdsourced Machine TranslationMTEngine: Semantic-level Crowdsourced Machine Translation
MTEngine: Semantic-level Crowdsourced Machine Translation
 
Introduction To Machine Translation
Introduction To Machine TranslationIntroduction To Machine Translation
Introduction To Machine Translation
 
NoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache HadoopNoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache Hadoop
 
Rule based approach to sentiment analysis at ROMIP 2011
Rule based approach to sentiment analysis at ROMIP 2011Rule based approach to sentiment analysis at ROMIP 2011
Rule based approach to sentiment analysis at ROMIP 2011
 
Poster: Method for an automatic generation of a semantic-level contextual tra...
Poster: Method for an automatic generation of a semantic-level contextual tra...Poster: Method for an automatic generation of a semantic-level contextual tra...
Poster: Method for an automatic generation of a semantic-level contextual tra...
 
Linguistic component Tokenizer for the Russian language
Linguistic component Tokenizer for the Russian languageLinguistic component Tokenizer for the Russian language
Linguistic component Tokenizer for the Russian language
 
Rule based approach to sentiment analysis at romip’11 slides
Rule based approach to sentiment analysis at romip’11 slidesRule based approach to sentiment analysis at romip’11 slides
Rule based approach to sentiment analysis at romip’11 slides
 
Semantic Analysis: theory, applications and use cases
Semantic Analysis: theory, applications and use casesSemantic Analysis: theory, applications and use cases
Semantic Analysis: theory, applications and use cases
 
IR: Open source state
IR: Open source stateIR: Open source state
IR: Open source state
 

Similar to Machine Translation Course Program Overview

New Development in MT Technology and Services, by Anthony Wong, CCID TransTech
New Development in MT Technology and Services, by Anthony Wong, CCID TransTechNew Development in MT Technology and Services, by Anthony Wong, CCID TransTech
New Development in MT Technology and Services, by Anthony Wong, CCID TransTechTAUS - The Language Data Network
 
Meta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methodsMeta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methodsLifeng (Aaron) Han
 
Integration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translationIntegration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translationChamani Shiranthika
 
A hybrid composite features based sentence level sentiment analyzer
A hybrid composite features based sentence level sentiment analyzerA hybrid composite features based sentence level sentiment analyzer
A hybrid composite features based sentence level sentiment analyzerIAESIJAI
 
French machine reading for question answering
French machine reading for question answeringFrench machine reading for question answering
French machine reading for question answeringAli Kabbadj
 
Computer Educational Theories Technology .pptx
Computer Educational Theories Technology  .pptxComputer Educational Theories Technology  .pptx
Computer Educational Theories Technology .pptxcarlaustria2
 
Error Analysis of Rule-based Machine Translation Outputs
Error Analysis of Rule-based Machine Translation OutputsError Analysis of Rule-based Machine Translation Outputs
Error Analysis of Rule-based Machine Translation OutputsParisa Niksefat
 
58903230-SentiMatrix-Named-Entity-Recognition-for-Romanian-Language
58903230-SentiMatrix-Named-Entity-Recognition-for-Romanian-Language58903230-SentiMatrix-Named-Entity-Recognition-for-Romanian-Language
58903230-SentiMatrix-Named-Entity-Recognition-for-Romanian-LanguageMarius Corici
 
Design and Development of a Malayalam to English Translator- A Transfer Based...
Design and Development of a Malayalam to English Translator- A Transfer Based...Design and Development of a Malayalam to English Translator- A Transfer Based...
Design and Development of a Malayalam to English Translator- A Transfer Based...Waqas Tariq
 
76 s201906
76 s20190676 s201906
76 s201906IJRAT
 
Development of an intelligent information resource model based on modern na...
  Development of an intelligent information resource model based on modern na...  Development of an intelligent information resource model based on modern na...
Development of an intelligent information resource model based on modern na...IJECEIAES
 
Fundamentals of data structures ellis horowitz & sartaj sahni
Fundamentals of data structures   ellis horowitz & sartaj sahniFundamentals of data structures   ellis horowitz & sartaj sahni
Fundamentals of data structures ellis horowitz & sartaj sahniHitesh Wagle
 
Course Syllabus For Operations Management
Course Syllabus For Operations ManagementCourse Syllabus For Operations Management
Course Syllabus For Operations ManagementYnal Qat
 
FUNDAMENTALS OFDatabase SystemsSEVENTH EDITION
FUNDAMENTALS OFDatabase SystemsSEVENTH EDITIONFUNDAMENTALS OFDatabase SystemsSEVENTH EDITION
FUNDAMENTALS OFDatabase SystemsSEVENTH EDITIONDustiBuckner14
 
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Quinsulon Israel
 
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...Lifeng (Aaron) Han
 
Legal Document
Legal DocumentLegal Document
Legal Documentlegal4
 
Legal Document
Legal DocumentLegal Document
Legal Documentlegal6
 
Legal Document
Legal DocumentLegal Document
Legal Documentlegal5
 

Similar to Machine Translation Course Program Overview (20)

New Development in MT Technology and Services, by Anthony Wong, CCID TransTech
New Development in MT Technology and Services, by Anthony Wong, CCID TransTechNew Development in MT Technology and Services, by Anthony Wong, CCID TransTech
New Development in MT Technology and Services, by Anthony Wong, CCID TransTech
 
Meta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methodsMeta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methods
 
Integration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translationIntegration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translation
 
A hybrid composite features based sentence level sentiment analyzer
A hybrid composite features based sentence level sentiment analyzerA hybrid composite features based sentence level sentiment analyzer
A hybrid composite features based sentence level sentiment analyzer
 
French machine reading for question answering
French machine reading for question answeringFrench machine reading for question answering
French machine reading for question answering
 
Computer Educational Theories Technology .pptx
Computer Educational Theories Technology  .pptxComputer Educational Theories Technology  .pptx
Computer Educational Theories Technology .pptx
 
Error Analysis of Rule-based Machine Translation Outputs
Error Analysis of Rule-based Machine Translation OutputsError Analysis of Rule-based Machine Translation Outputs
Error Analysis of Rule-based Machine Translation Outputs
 
58903230-SentiMatrix-Named-Entity-Recognition-for-Romanian-Language
58903230-SentiMatrix-Named-Entity-Recognition-for-Romanian-Language58903230-SentiMatrix-Named-Entity-Recognition-for-Romanian-Language
58903230-SentiMatrix-Named-Entity-Recognition-for-Romanian-Language
 
Design and Development of a Malayalam to English Translator- A Transfer Based...
Design and Development of a Malayalam to English Translator- A Transfer Based...Design and Development of a Malayalam to English Translator- A Transfer Based...
Design and Development of a Malayalam to English Translator- A Transfer Based...
 
76 s201906
76 s20190676 s201906
76 s201906
 
Development of an intelligent information resource model based on modern na...
  Development of an intelligent information resource model based on modern na...  Development of an intelligent information resource model based on modern na...
Development of an intelligent information resource model based on modern na...
 
Fundamentals of data structures ellis horowitz & sartaj sahni
Fundamentals of data structures   ellis horowitz & sartaj sahniFundamentals of data structures   ellis horowitz & sartaj sahni
Fundamentals of data structures ellis horowitz & sartaj sahni
 
Course Syllabus For Operations Management
Course Syllabus For Operations ManagementCourse Syllabus For Operations Management
Course Syllabus For Operations Management
 
Semester VI.pdf
Semester VI.pdfSemester VI.pdf
Semester VI.pdf
 
FUNDAMENTALS OFDatabase SystemsSEVENTH EDITION
FUNDAMENTALS OFDatabase SystemsSEVENTH EDITIONFUNDAMENTALS OFDatabase SystemsSEVENTH EDITION
FUNDAMENTALS OFDatabase SystemsSEVENTH EDITION
 
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
 
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
 
Legal Document
Legal DocumentLegal Document
Legal Document
 
Legal Document
Legal DocumentLegal Document
Legal Document
 
Legal Document
Legal DocumentLegal Document
Legal Document
 

More from Dmitry Kan

London IR Meetup - Players in Vector Search_ algorithms, software and use cases
London IR Meetup - Players in Vector Search_ algorithms, software and use casesLondon IR Meetup - Players in Vector Search_ algorithms, software and use cases
London IR Meetup - Players in Vector Search_ algorithms, software and use casesDmitry Kan
 
Vector databases and neural search
Vector databases and neural searchVector databases and neural search
Vector databases and neural searchDmitry Kan
 
Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...
Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...
Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...Dmitry Kan
 
SentiScan: система автоматической разметки тональности в social media
SentiScan: система автоматической разметки тональности в social mediaSentiScan: система автоматической разметки тональности в social media
SentiScan: система автоматической разметки тональности в social mediaDmitry Kan
 
Icsoft 2011 51_cr
Icsoft 2011 51_crIcsoft 2011 51_cr
Icsoft 2011 51_crDmitry Kan
 
Computer Semantics And Machine Translation
Computer Semantics And Machine TranslationComputer Semantics And Machine Translation
Computer Semantics And Machine TranslationDmitry Kan
 

More from Dmitry Kan (6)

London IR Meetup - Players in Vector Search_ algorithms, software and use cases
London IR Meetup - Players in Vector Search_ algorithms, software and use casesLondon IR Meetup - Players in Vector Search_ algorithms, software and use cases
London IR Meetup - Players in Vector Search_ algorithms, software and use cases
 
Vector databases and neural search
Vector databases and neural searchVector databases and neural search
Vector databases and neural search
 
Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...
Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...
Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...
 
SentiScan: система автоматической разметки тональности в social media
SentiScan: система автоматической разметки тональности в social mediaSentiScan: система автоматической разметки тональности в social media
SentiScan: система автоматической разметки тональности в social media
 
Icsoft 2011 51_cr
Icsoft 2011 51_crIcsoft 2011 51_cr
Icsoft 2011 51_cr
 
Computer Semantics And Machine Translation
Computer Semantics And Machine TranslationComputer Semantics And Machine Translation
Computer Semantics And Machine Translation
 

Recently uploaded

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 

Recently uploaded (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

Machine Translation Course Program Overview

  • 1. Machine Translation course program Brief description of the course: There are two fundamental approaches to machine translation: rule-based approach (based on formal models of natural languages, like e.g. dependency grammars) and statistical approaches (based on parallel streams of data). Both these approaches have their advantages: rule-based one being formal and structured, while statistic approach gives an opportunity to construct and scale the system without the need to deeply study properties of a natural language. On the other hand both these approaches have their problematic areas: rule-based approach is bound to a given language or a family of languages, while statistic approach doesn’t allow controlling subtle structures and properties of a natural language, like for example generating prepositions. Recently combining these two fundamental approaches have been of a special interest of scientists. An entire pipeline of machine translation, starting from source language formalization and finishing with word reordering on the target language side, can be considered as a training area for combining rule based with statistics. This course will introduce students into all sub-tasks of creating a machine translation system using both fundamental approaches: formalization of natural language, translational dictionaries, phrase translation, machine translation models, decoding and word reordering. The course will also present formal semantic models of natural languages and their place in the topic. Along with that, machine learning methods (like structured prediction) will be in the focus of the course. The course material assumes knowledge of general higher mathematics and knowledge or interest in the natural language processing. We will have some hands-on and take-away knowledge sessions, which assume familiarity with formats, NLP algorithms and libraries. Course topics 1. Introduction to MT. Motivation of its existence 2. Short history of MT, mane phases. ALPAC report 3. MT systems triangle. Direct and indirect MT. Examples of MT systems 4. Current MT systems existing in the industry, main players 5. Existing software packages for natural language processing and building an MT system 6. Two fundamental approaches to MT: statistical and rule-based (classical) 7. Methods of MT 8. Direct MT system, its features, pros and cons. 9. Transfer MT system, types of transfer methods, features 10. Notion of interlingua. Features of MT based on interlingua, its comparison with transfer 11. Statistical MT and its components 12. Example based MT systems 13. Theory of statistical MT systems. Fundamental equation (Bayes theorem). Notion of statistical language model. MT model 14. model of machine translation in statistical MT 15. Task of word alignment 16. Features of MT systems 17. Existing programming components of statistical MT systems 18. Evaluation of MT systems: human evaluation and automatic metrics 19. BLEU score 20. METEOR score 21. NIST score 22. Round-trip evaluation method 23. Hybrid MT systems 24. Task of word reordering in a sentence on the target side. Rule-based and statistical approaches 25. Computer semantics of a natural language. MT system based on it 26. Pragmatics and context analysis on cross-sentence level 27. Practical details of software packages: GIZA++, SRILM, Moses
  • 2. 28. Method of structured prediction for learning machine translation models Seminar topics 1. Mathematics of statistical MT, paper [1] 2. Hierarchical model of statistical MT, paper [2] 3. Phrase-based statistical MT, paper [3] 4. Rule-based MT systems, papers [4,5] 5. Hybrid MT systems, based on examples, paper [6] 6. BLEU score in details, paper [8] 7. Robust large-scale MT systems, based on examples, paper [9] Bibliography [1] Brown P., Della Petra S., Della Petra V., Mercer R.: The Mathematics of Statistical Machine Translation: Parameter Estimation, 1993 [2] Chiang D.: A Hierarchical Phrase-Based Model for Statistical Machine Translation, 2005 [3] Koehn P., Och F., Marcu D.: Statistical Phrase-Based Machine Translation, 2003 [4] Kaplan R., Netter K., Wedekind J., Zaenen A.: Translation By Structural Correspondences, 1989 [5] Landsbergen J.: The Rosetta Project, 1989 [6] Groves D., Way A.: Hybrid Example-Based SMT: the Best of Both Worlds? [7] Athanaselis T., Bakamidis S., Dologou I.: Words Reordering based on Statistical Language Model, 2006 [8] Papineni K., Roukos S., Ward T., Zhu W.-J.: BLEU: a Method for Automatic Evaluation of Machine Translation, 2002 [9] Gough N., Way A.: Robust Large-Scale EBMT with Marker-Based Segmentation, 2004