SlideShare una empresa de Scribd logo
1 de 37
‘Past, Present, and Future’
Machine Translation & Natural Language
Processing for Patent Information
Dr. John Tinsley
CEO, Iconic Translation Machines Ltd.
EPOPIC. Madrid. 10th November 2016
BSc in Computational Linguistics
PhD in Machine Translation
Language Technology consultant
Founder of Iconic Translation Machines
Why listen to me?
Machine Translation is what I do!
The world’s first and only patent specific machine translation platform
 The use of computers to translate from one language into another
 The use of computers to automate some, or all, of the translation
process
 An approach to Machine Translation, where translations for an input are
estimated based on previous seen translation examples and associated
(inferred) probabilities.
 e.g. IPTranslator, Google Translate
 Rule-based (or transfer-based): based on linguistic rules
• e.g. Systran; Altavista’s Babelfish
 Example-based: based on translation examples and inferred linguistic
patterns
Machine Translation: The Basics
Machine Translation = automatic translation
Statistical Machine Translation (SMT)
Other approaches
SMT is now by far the predominant approach*
A corpus (pl. corpora) is a collection
of texts, in electronic format, in a
single language
 document(s)
 book(s)
Bilingual Corpora
a bilingual corpus
Note source language = original language or language we’re translating from
target language = language we’re translating into
A bilingual corpus is a collection of
corresponding texts, in multiple
languages
 a document & its translation
 a book in multiple languages
 European Parliament proceedings
Aligned Bilingual Corpora
A document-aligned bilingual corpus corresponds on a document level
For translation, we required sentence-aligned bilingual corpora
 The sentence on line 1 in the source language text corresponds
to (i.e. is a translation of) the sentence on line 1 in the target
language text etc.
 Often referred to as parallel aligned corpora
Sentence aligned bilingual parallel corpora
are essential for statistical machine translation
Learning from Previous Translations
Suppose we already know
(from a sentence-aligned bilingual
corpus) that:
 “dog” is translated as “perro”
 “I have a cat” is translated as
“Tengo un gato”
We can theoretically translate:
 “I have a dog”  “Tengo un perro”
 Even though we have never seen “I
have a dog” before
Statistical machine translation induces information about unseen input, based on
previously known translations:
 Primarily co-occurrence statistics
 Takes contextual information into account
Statistical Machine Translation
 Example of a small sentence-aligned
bilingual corpus for English-French
Statistical Machine Translation
 We take some new sentence to translate
Statistical Machine Translation
 From the corpus we can infer possible target (French)
translations for various source (English) words
 We can then select the most probable translations
based on simple frequencies (co-occurrence statistics)
Statistical Machine Translation
Given a previously unseen input sentence, and our collated statistics,
we can estimate translation
Advanced MT
All modern approaches are based on building translations for complete
sentences by putting together smaller pieces of translation
Previous example is very simplistic
 In reality SMT systems calculate much more complex statistical models
over millions of sentence pairs for a pair of languages
 Upwards of 2M sentence pairs on average for large-scale systems
 Word-to-word translation probabilities
 Phrase-to-phrase translation probabilities
 Word order probabilities
 Linguistic information (are the words nouns, verbs?)
 Fluency of the final output
Previous example is very simplistic
Other statistics calculated include
Data is Key
For SMT data is key
 Information (word/phrase correspondences and associated statistics) is only based
on what we have seen before in the data
Important that data used to train SMT systems is:
 Of sufficient size
 avoid sparseness/skewed statistics
 Representative and relevant
 contains the right type of language
 High-quality
 absence of misspellings,
incorrect alignments etc.
 Proofed by human
translators
training data
Why is MT Difficult?
A word or a phrase can have more than one meaning (ambiguity – lexical or
structural)
 e.g. “bank”, “dive”, “I saw the man with the telescope”
People use language creatively
 New words are cropping up all the time
Linguistic differences between languages
 e.g. structure of Irish sentences vs. structure of English sentences:
 “Tá (Is) ocras (hunger) orm (on me)” <-> “I am hungry”
There can be more than one way to express the same meaning.
 “New York”, “The Big Apple”, “NYC”
Why is MT Difficult?
 Israeli officials are responsible for airport security.
 Israel is in charge of the security at this airport.
 The security work for this airport is the responsibility of the Israel government.
 Israeli side was in charge of the security of this airport.
 Israel is responsible for the airport’s security.
 Israel is responsible for safety work at this airport.
 Israel presides over the security of the airport.
 Israel took charge of the airport security.
 The safety of this airport is taken charge of by Israel.
 This airport’s security is the responsibility of the Israeli security officials.
No single solution for all languages
Number agreement: the house / the houses vs. la maison / les maisons
Gender agreement: the house / the cheese vs. la maison / le frommage
English - Spanish
English - French
No single solution for all languages
English - German
English - Chinese
种水果的农民
The farmer who grows fruit
[Lit: “grow fruit (particle) farmer”]
Not all languages are created equal
French German Turkish Finnish
Spanish Chinese Korean Hungarian
Portuguese Japanese Thai Basque
The Challenge of Patents
L is an organic group selected from -CH2-
(OCH2CH2)n-, -CO-NR'-, with R'=H or
C1-C4 alkyl group; n=0-8; Y=F, CF3 …
maximum stress of 1.2 to 3.5 N/mm<2>
and a maximum elongation of 700 to
1,300% at 0[deg.] C.
Long Sentences
Technical constructions
Largest single document: 249,322 words
Longest Sentence: 1,417 words
The Challenge of Patents
Very long sentences as standard
Grammatically incomplete using
nominal and telegraphic style (!)
Passive forms are frequent
Frequent use of subordinate clauses,
participles, implicit constructs
Inconsistent and incorrect spelling
High use of neologisms
Instances of synonymy and polysemy
Spurious use of punctuation
Authoring guide
for “to be
translated” text
Patents break
almost all of the
rules!
Judge the quality of an MT system by comparing its output against a
human-produced “reference” translation
 Pros: Quick, cheap, consistent
 Cons: Inflexible, cannot be used on ‘new’ input
 Pros: Reliable, flexible, multi-faceted (fluency, error analyses,
benchmarking)
 Cons: Slow, expensive, subjective
 Fluency vs. Adequacy
Evaluating Machine Translation Quality
Automatic Evaluation
Human Evaluation
Task-Based Evaluation
Evaluating Machine Translation Quality
Task Based Evaluation
 Standalone evaluation of MT systems is necessary to get a sense of the
overall quality of a system
 To determine the ultimate usability of an MT system, intrinsic task-based
evaluation is required
 Why? Fluency vs. Adequacy
Fluency how fluent and grammatically correct the translation
output is
Adequacy how accurately the translation conveys the meaning of the
source
Output 1 The big blue house
Output 2 The big house red
Source La gran casa roja
Task-Based Evaluation
Practical uses of Machine Translation
Understand its limitations and you’ll understand
its capabilities!
No
 Translate a patent for filing
 Translate literature for
publication
 Translate marketing
materials
 Anything mission critical
without review
Yes
 Productivity tool for
professional translation
 Understand foreign patents
 Localisation processes and
“controlled’ content
 High volume, e.g. eDiscovery
Use cases in practice
Product descriptions
to open new markets
MT for post-editing
productivity across
industries
Developer, and user
for web content
Tens of thousands of
people using online
tools daily
Neural Networks
 Using artificial intelligence and deep learning to develop a
completely new way of doing machine translation!
Quality Estimation
 Functionality through which machine translation can “self-
assess” the quality of the translations it produces.
Online Adaptive Translation
 Machine translations that can automatically learn and improve
based on feedback, particularly from revisions.
Use-case specific MT
 Just like patent MT, but for countless other areas.
Current Hot Topics
About Iconic
We are a Machine Translation and Natural
Language Processing software and
services provider, delivering expert
solutions with Subject Matter Expertise
Iconic Ensemble Architecture…
…enhanced with Neural MT
Speed, Cost, and Quality
What is the difference between machine translation vs. manual translation when
translating a 10 page patent document from Chinese into English?
Machine Translation is not
designed to replace
professional translation but
there are many cases
where costly and time-
consuming manual
translation is simply not
necessary.
- Data confidentiality
- File formats
- Potential for customisation,
enhancements, and
improvement for specific
domains
More than just translation
DATA PROCESSING
E.G. OPTICAL CHARACTER
RECOGNITION, DIGITISATION
DATABASE BUILDING
E.G. COMBINING THE ABOVE, WITH
TRANSLATION, FOR EXPORT
DATA UNDERSTANDING
E.G. SUMMARISATION, CONCEPT &
KEY TERM IDENTIFICATION
INFORMATION EXTRACTION
E.G. CITATION ANALYSIS, CROSS-
LINGUAL SEARCH
Record Extraction
Extraction algorithms work on cleaned
OCR output, using patterns, keywords,
and formatting information.
Citation Analysis
Assessment of record and reference patterns Application for record extraction
Tracking variations across years
Application for bibliographic data fielding
Reference extraction + fielding
.com
Visit
and use the promo code epo2016 to get 20
free pages of translation
Thank You!
john@iptranslator.com
@IconicTrans

Más contenido relacionado

La actualidad más candente

Machine Translation: Latest Innovations and their Impact on Commercial Transl...
Machine Translation: Latest Innovations and their Impact on Commercial Transl...Machine Translation: Latest Innovations and their Impact on Commercial Transl...
Machine Translation: Latest Innovations and their Impact on Commercial Transl...SDL
 
Introduction To Translation Technologies
Introduction To Translation TechnologiesIntroduction To Translation Technologies
Introduction To Translation Technologiesxenotext
 
Statistical machine translation for indian language copy
Statistical machine translation for indian language   copyStatistical machine translation for indian language   copy
Statistical machine translation for indian language copyNakul Sharma
 
What machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happyWhat machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happyIconic Translation Machines
 
2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?
2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?
2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?tauyou
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk Vijay Ganti
 
2. Constantin Orasan (UoW) EXPERT Introduction
2. Constantin Orasan (UoW) EXPERT Introduction2. Constantin Orasan (UoW) EXPERT Introduction
2. Constantin Orasan (UoW) EXPERT IntroductionRIILP
 
MT and Translator's Tools
MT and Translator's ToolsMT and Translator's Tools
MT and Translator's ToolsJim O'Regan
 
MT and Post Editing in master's level translation education
MT and Post Editing in master's level translation education MT and Post Editing in master's level translation education
MT and Post Editing in master's level translation education Jakub Absolon
 
Techniques in translation, computer assisted, machine translation, subtitling...
Techniques in translation, computer assisted, machine translation, subtitling...Techniques in translation, computer assisted, machine translation, subtitling...
Techniques in translation, computer assisted, machine translation, subtitling...Moses Altovar
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4DigiGurukul
 
The Role Of Translators In MT: EU 2010
The Role Of Translators In MT:  EU 2010The Role Of Translators In MT:  EU 2010
The Role Of Translators In MT: EU 2010LoriThicke
 
"Automatic speech recognition for mobile applications in Yandex" — Fran Campi...
"Automatic speech recognition for mobile applications in Yandex" — Fran Campi..."Automatic speech recognition for mobile applications in Yandex" — Fran Campi...
"Automatic speech recognition for mobile applications in Yandex" — Fran Campi...Yandex
 
Language translator
Language translatorLanguage translator
Language translatorSumitSumit26
 
EXTENDING A MODEL FOR ONTOLOGY-BASED ARABIC-ENGLISH MACHINE TRANSLATION (NAN)
EXTENDING A MODEL FOR ONTOLOGY-BASED ARABIC-ENGLISH MACHINE TRANSLATION (NAN) EXTENDING A MODEL FOR ONTOLOGY-BASED ARABIC-ENGLISH MACHINE TRANSLATION (NAN)
EXTENDING A MODEL FOR ONTOLOGY-BASED ARABIC-ENGLISH MACHINE TRANSLATION (NAN) ijaia
 

La actualidad más candente (20)

Machine Translation: Latest Innovations and their Impact on Commercial Transl...
Machine Translation: Latest Innovations and their Impact on Commercial Transl...Machine Translation: Latest Innovations and their Impact on Commercial Transl...
Machine Translation: Latest Innovations and their Impact on Commercial Transl...
 
Machine Translation
Machine TranslationMachine Translation
Machine Translation
 
Introduction To Translation Technologies
Introduction To Translation TechnologiesIntroduction To Translation Technologies
Introduction To Translation Technologies
 
Statistical machine translation for indian language copy
Statistical machine translation for indian language   copyStatistical machine translation for indian language   copy
Statistical machine translation for indian language copy
 
What machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happyWhat machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happy
 
Moses
MosesMoses
Moses
 
Techniques in Translation
Techniques in TranslationTechniques in Translation
Techniques in Translation
 
2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?
2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?
2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?
 
Why Ruby
Why RubyWhy Ruby
Why Ruby
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk
 
2. Constantin Orasan (UoW) EXPERT Introduction
2. Constantin Orasan (UoW) EXPERT Introduction2. Constantin Orasan (UoW) EXPERT Introduction
2. Constantin Orasan (UoW) EXPERT Introduction
 
MT and Translator's Tools
MT and Translator's ToolsMT and Translator's Tools
MT and Translator's Tools
 
MT and Post Editing in master's level translation education
MT and Post Editing in master's level translation education MT and Post Editing in master's level translation education
MT and Post Editing in master's level translation education
 
Chatbot ppt
Chatbot pptChatbot ppt
Chatbot ppt
 
Techniques in translation, computer assisted, machine translation, subtitling...
Techniques in translation, computer assisted, machine translation, subtitling...Techniques in translation, computer assisted, machine translation, subtitling...
Techniques in translation, computer assisted, machine translation, subtitling...
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4
 
The Role Of Translators In MT: EU 2010
The Role Of Translators In MT:  EU 2010The Role Of Translators In MT:  EU 2010
The Role Of Translators In MT: EU 2010
 
"Automatic speech recognition for mobile applications in Yandex" — Fran Campi...
"Automatic speech recognition for mobile applications in Yandex" — Fran Campi..."Automatic speech recognition for mobile applications in Yandex" — Fran Campi...
"Automatic speech recognition for mobile applications in Yandex" — Fran Campi...
 
Language translator
Language translatorLanguage translator
Language translator
 
EXTENDING A MODEL FOR ONTOLOGY-BASED ARABIC-ENGLISH MACHINE TRANSLATION (NAN)
EXTENDING A MODEL FOR ONTOLOGY-BASED ARABIC-ENGLISH MACHINE TRANSLATION (NAN) EXTENDING A MODEL FOR ONTOLOGY-BASED ARABIC-ENGLISH MACHINE TRANSLATION (NAN)
EXTENDING A MODEL FOR ONTOLOGY-BASED ARABIC-ENGLISH MACHINE TRANSLATION (NAN)
 

Destacado

What? Why? How? Factors that impact the success of commercial MT projects
What? Why? How? Factors that impact the success of commercial MT projectsWhat? Why? How? Factors that impact the success of commercial MT projects
What? Why? How? Factors that impact the success of commercial MT projectsIconic Translation Machines
 
Seeing the Wood for the Trees in MT Evaluation: an LSP success story from RWS
Seeing the Wood for the Trees in MT Evaluation: an LSP success story from RWSSeeing the Wood for the Trees in MT Evaluation: an LSP success story from RWS
Seeing the Wood for the Trees in MT Evaluation: an LSP success story from RWSIconic Translation Machines
 
Finding the right medical device information in embase 11 2016
Finding the right medical device information in embase 11 2016Finding the right medical device information in embase 11 2016
Finding the right medical device information in embase 11 2016Ann-Marie Roche
 
Drug analytics based on triple linking v1.0
Drug analytics based on triple linking v1.0Drug analytics based on triple linking v1.0
Drug analytics based on triple linking v1.0Ann-Marie Roche
 
Stratergies for the intergration of information (IPI_ConfEX)
Stratergies for the intergration of information (IPI_ConfEX)Stratergies for the intergration of information (IPI_ConfEX)
Stratergies for the intergration of information (IPI_ConfEX)Ben Gardner
 
Substance searching in Reaxys - Webinar - 24 March 2015
Substance searching in Reaxys - Webinar - 24 March 2015Substance searching in Reaxys - Webinar - 24 March 2015
Substance searching in Reaxys - Webinar - 24 March 2015Ann-Marie Roche
 
Literature monitoring for pv what are we doing at galderma elsevier webinar
Literature monitoring for pv   what are we doing at galderma elsevier webinarLiterature monitoring for pv   what are we doing at galderma elsevier webinar
Literature monitoring for pv what are we doing at galderma elsevier webinarAnn-Marie Roche
 
Elastic search & patent information @ mtc
Elastic search & patent information @ mtcElastic search & patent information @ mtc
Elastic search & patent information @ mtcArne Krueger
 
Medical device reporting 27 sep2016
Medical device reporting 27 sep2016Medical device reporting 27 sep2016
Medical device reporting 27 sep2016Ann-Marie Roche
 

Destacado (10)

What? Why? How? Factors that impact the success of commercial MT projects
What? Why? How? Factors that impact the success of commercial MT projectsWhat? Why? How? Factors that impact the success of commercial MT projects
What? Why? How? Factors that impact the success of commercial MT projects
 
Seeing the Wood for the Trees in MT Evaluation: an LSP success story from RWS
Seeing the Wood for the Trees in MT Evaluation: an LSP success story from RWSSeeing the Wood for the Trees in MT Evaluation: an LSP success story from RWS
Seeing the Wood for the Trees in MT Evaluation: an LSP success story from RWS
 
Finding the right medical device information in embase 11 2016
Finding the right medical device information in embase 11 2016Finding the right medical device information in embase 11 2016
Finding the right medical device information in embase 11 2016
 
Eac webinar 09.21.2016
Eac webinar 09.21.2016Eac webinar 09.21.2016
Eac webinar 09.21.2016
 
Drug analytics based on triple linking v1.0
Drug analytics based on triple linking v1.0Drug analytics based on triple linking v1.0
Drug analytics based on triple linking v1.0
 
Stratergies for the intergration of information (IPI_ConfEX)
Stratergies for the intergration of information (IPI_ConfEX)Stratergies for the intergration of information (IPI_ConfEX)
Stratergies for the intergration of information (IPI_ConfEX)
 
Substance searching in Reaxys - Webinar - 24 March 2015
Substance searching in Reaxys - Webinar - 24 March 2015Substance searching in Reaxys - Webinar - 24 March 2015
Substance searching in Reaxys - Webinar - 24 March 2015
 
Literature monitoring for pv what are we doing at galderma elsevier webinar
Literature monitoring for pv   what are we doing at galderma elsevier webinarLiterature monitoring for pv   what are we doing at galderma elsevier webinar
Literature monitoring for pv what are we doing at galderma elsevier webinar
 
Elastic search & patent information @ mtc
Elastic search & patent information @ mtcElastic search & patent information @ mtc
Elastic search & patent information @ mtc
 
Medical device reporting 27 sep2016
Medical device reporting 27 sep2016Medical device reporting 27 sep2016
Medical device reporting 27 sep2016
 

Similar a Past, Present, and Future: Machine Translation & Natural Language Processing for Patent Information

Language Grid
Language GridLanguage Grid
Language Gridlindh
 
Machine translation from English to Hindi
Machine translation from English to HindiMachine translation from English to Hindi
Machine translation from English to HindiRajat Jain
 
Machine translation ppt by shantanu arora
Machine translation ppt by shantanu aroraMachine translation ppt by shantanu arora
Machine translation ppt by shantanu aroraVaishnaviKhandelwal6
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language ProcessingMichel Bruley
 
Data and Linguistics: Delivering Machine Translation with Subject Matter Expe...
Data and Linguistics: Delivering Machine Translation with Subject Matter Expe...Data and Linguistics: Delivering Machine Translation with Subject Matter Expe...
Data and Linguistics: Delivering Machine Translation with Subject Matter Expe...Iconic Translation Machines
 
AI, don't f$# up my name.pdf
AI, don't f$# up my name.pdfAI, don't f$# up my name.pdf
AI, don't f$# up my name.pdfMarcis Pinnis
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguisticsAdnanBaloch15
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsOVHcloud
 
Good Applications of Bad Machine Translation
Good Applications of Bad Machine TranslationGood Applications of Bad Machine Translation
Good Applications of Bad Machine Translationbdonaldson
 
Translation j 2009-cocci (1)
Translation j 2009-cocci (1)Translation j 2009-cocci (1)
Translation j 2009-cocci (1)FabiolaPanetti
 
Multi lingual corpus for machine aided translation
Multi lingual corpus for machine aided translationMulti lingual corpus for machine aided translation
Multi lingual corpus for machine aided translationAashna Phanda
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMassimo Schenone
 
A Short Introduction To Text-To-Speech Synthesis
A Short Introduction To Text-To-Speech SynthesisA Short Introduction To Text-To-Speech Synthesis
A Short Introduction To Text-To-Speech SynthesisCynthia King
 

Similar a Past, Present, and Future: Machine Translation & Natural Language Processing for Patent Information (20)

Language Grid
Language GridLanguage Grid
Language Grid
 
Machine translation from English to Hindi
Machine translation from English to HindiMachine translation from English to Hindi
Machine translation from English to Hindi
 
machine transaltion
machine transaltionmachine transaltion
machine transaltion
 
Machine translation
Machine translationMachine translation
Machine translation
 
visH (fin).pptx
visH (fin).pptxvisH (fin).pptx
visH (fin).pptx
 
Machine translation ppt by shantanu arora
Machine translation ppt by shantanu aroraMachine translation ppt by shantanu arora
Machine translation ppt by shantanu arora
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language Processing
 
Translationusing moses1
Translationusing moses1Translationusing moses1
Translationusing moses1
 
Data and Linguistics: Delivering Machine Translation with Subject Matter Expe...
Data and Linguistics: Delivering Machine Translation with Subject Matter Expe...Data and Linguistics: Delivering Machine Translation with Subject Matter Expe...
Data and Linguistics: Delivering Machine Translation with Subject Matter Expe...
 
E-Translation
E-TranslationE-Translation
E-Translation
 
Cyflwyniad Bloc
Cyflwyniad BlocCyflwyniad Bloc
Cyflwyniad Bloc
 
AI, don't f$# up my name.pdf
AI, don't f$# up my name.pdfAI, don't f$# up my name.pdf
AI, don't f$# up my name.pdf
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP models
 
An Application for Performing Real Time Speech Translation in Mobile Environment
An Application for Performing Real Time Speech Translation in Mobile EnvironmentAn Application for Performing Real Time Speech Translation in Mobile Environment
An Application for Performing Real Time Speech Translation in Mobile Environment
 
Good Applications of Bad Machine Translation
Good Applications of Bad Machine TranslationGood Applications of Bad Machine Translation
Good Applications of Bad Machine Translation
 
Translation j 2009-cocci (1)
Translation j 2009-cocci (1)Translation j 2009-cocci (1)
Translation j 2009-cocci (1)
 
Multi lingual corpus for machine aided translation
Multi lingual corpus for machine aided translationMulti lingual corpus for machine aided translation
Multi lingual corpus for machine aided translation
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSIS
 
A Short Introduction To Text-To-Speech Synthesis
A Short Introduction To Text-To-Speech SynthesisA Short Introduction To Text-To-Speech Synthesis
A Short Introduction To Text-To-Speech Synthesis
 

Más de Iconic Translation Machines

The growing role of translation technology in e-discovery, litigation, digita...
The growing role of translation technology in e-discovery, litigation, digita...The growing role of translation technology in e-discovery, litigation, digita...
The growing role of translation technology in e-discovery, litigation, digita...Iconic Translation Machines
 
Making the Old New Again - Modern Technical Provides Access to Historical Che...
Making the Old New Again - Modern Technical Provides Access to Historical Che...Making the Old New Again - Modern Technical Provides Access to Historical Che...
Making the Old New Again - Modern Technical Provides Access to Historical Che...Iconic Translation Machines
 
Improving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case StudyImproving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case StudyIconic Translation Machines
 
From the Lab to the Market: Commercialising MT Research
From the Lab to the Market: Commercialising MT ResearchFrom the Lab to the Market: Commercialising MT Research
From the Lab to the Market: Commercialising MT ResearchIconic Translation Machines
 
Beyond Data: Delivering Machine Translation with Subject Matter Expertise
Beyond Data: Delivering Machine Translation with Subject Matter ExpertiseBeyond Data: Delivering Machine Translation with Subject Matter Expertise
Beyond Data: Delivering Machine Translation with Subject Matter ExpertiseIconic Translation Machines
 

Más de Iconic Translation Machines (8)

The growing role of translation technology in e-discovery, litigation, digita...
The growing role of translation technology in e-discovery, litigation, digita...The growing role of translation technology in e-discovery, litigation, digita...
The growing role of translation technology in e-discovery, litigation, digita...
 
Making the Old New Again - Modern Technical Provides Access to Historical Che...
Making the Old New Again - Modern Technical Provides Access to Historical Che...Making the Old New Again - Modern Technical Provides Access to Historical Che...
Making the Old New Again - Modern Technical Provides Access to Historical Che...
 
Machine Translation: The Neural Frontier
Machine Translation: The Neural FrontierMachine Translation: The Neural Frontier
Machine Translation: The Neural Frontier
 
Innovative Business and Pricing Models: for MT
Innovative Business and Pricing Models: for MTInnovative Business and Pricing Models: for MT
Innovative Business and Pricing Models: for MT
 
Improving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case StudyImproving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case Study
 
MT Evaluation: Seeing the Wood for the Trees
MT Evaluation: Seeing the Wood for the TreesMT Evaluation: Seeing the Wood for the Trees
MT Evaluation: Seeing the Wood for the Trees
 
From the Lab to the Market: Commercialising MT Research
From the Lab to the Market: Commercialising MT ResearchFrom the Lab to the Market: Commercialising MT Research
From the Lab to the Market: Commercialising MT Research
 
Beyond Data: Delivering Machine Translation with Subject Matter Expertise
Beyond Data: Delivering Machine Translation with Subject Matter ExpertiseBeyond Data: Delivering Machine Translation with Subject Matter Expertise
Beyond Data: Delivering Machine Translation with Subject Matter Expertise
 

Último

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 

Último (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

Past, Present, and Future: Machine Translation & Natural Language Processing for Patent Information

  • 1. ‘Past, Present, and Future’ Machine Translation & Natural Language Processing for Patent Information Dr. John Tinsley CEO, Iconic Translation Machines Ltd. EPOPIC. Madrid. 10th November 2016
  • 2. BSc in Computational Linguistics PhD in Machine Translation Language Technology consultant Founder of Iconic Translation Machines Why listen to me? Machine Translation is what I do! The world’s first and only patent specific machine translation platform
  • 3.  The use of computers to translate from one language into another  The use of computers to automate some, or all, of the translation process  An approach to Machine Translation, where translations for an input are estimated based on previous seen translation examples and associated (inferred) probabilities.  e.g. IPTranslator, Google Translate  Rule-based (or transfer-based): based on linguistic rules • e.g. Systran; Altavista’s Babelfish  Example-based: based on translation examples and inferred linguistic patterns Machine Translation: The Basics Machine Translation = automatic translation Statistical Machine Translation (SMT) Other approaches SMT is now by far the predominant approach*
  • 4. A corpus (pl. corpora) is a collection of texts, in electronic format, in a single language  document(s)  book(s) Bilingual Corpora a bilingual corpus Note source language = original language or language we’re translating from target language = language we’re translating into A bilingual corpus is a collection of corresponding texts, in multiple languages  a document & its translation  a book in multiple languages  European Parliament proceedings
  • 5. Aligned Bilingual Corpora A document-aligned bilingual corpus corresponds on a document level For translation, we required sentence-aligned bilingual corpora  The sentence on line 1 in the source language text corresponds to (i.e. is a translation of) the sentence on line 1 in the target language text etc.  Often referred to as parallel aligned corpora Sentence aligned bilingual parallel corpora are essential for statistical machine translation
  • 6. Learning from Previous Translations Suppose we already know (from a sentence-aligned bilingual corpus) that:  “dog” is translated as “perro”  “I have a cat” is translated as “Tengo un gato” We can theoretically translate:  “I have a dog”  “Tengo un perro”  Even though we have never seen “I have a dog” before Statistical machine translation induces information about unseen input, based on previously known translations:  Primarily co-occurrence statistics  Takes contextual information into account
  • 7. Statistical Machine Translation  Example of a small sentence-aligned bilingual corpus for English-French
  • 8. Statistical Machine Translation  We take some new sentence to translate
  • 9. Statistical Machine Translation  From the corpus we can infer possible target (French) translations for various source (English) words  We can then select the most probable translations based on simple frequencies (co-occurrence statistics)
  • 10. Statistical Machine Translation Given a previously unseen input sentence, and our collated statistics, we can estimate translation
  • 11. Advanced MT All modern approaches are based on building translations for complete sentences by putting together smaller pieces of translation Previous example is very simplistic  In reality SMT systems calculate much more complex statistical models over millions of sentence pairs for a pair of languages  Upwards of 2M sentence pairs on average for large-scale systems  Word-to-word translation probabilities  Phrase-to-phrase translation probabilities  Word order probabilities  Linguistic information (are the words nouns, verbs?)  Fluency of the final output Previous example is very simplistic Other statistics calculated include
  • 12. Data is Key For SMT data is key  Information (word/phrase correspondences and associated statistics) is only based on what we have seen before in the data Important that data used to train SMT systems is:  Of sufficient size  avoid sparseness/skewed statistics  Representative and relevant  contains the right type of language  High-quality  absence of misspellings, incorrect alignments etc.  Proofed by human translators training data
  • 13. Why is MT Difficult? A word or a phrase can have more than one meaning (ambiguity – lexical or structural)  e.g. “bank”, “dive”, “I saw the man with the telescope” People use language creatively  New words are cropping up all the time Linguistic differences between languages  e.g. structure of Irish sentences vs. structure of English sentences:  “Tá (Is) ocras (hunger) orm (on me)” <-> “I am hungry” There can be more than one way to express the same meaning.  “New York”, “The Big Apple”, “NYC”
  • 14. Why is MT Difficult?  Israeli officials are responsible for airport security.  Israel is in charge of the security at this airport.  The security work for this airport is the responsibility of the Israel government.  Israeli side was in charge of the security of this airport.  Israel is responsible for the airport’s security.  Israel is responsible for safety work at this airport.  Israel presides over the security of the airport.  Israel took charge of the airport security.  The safety of this airport is taken charge of by Israel.  This airport’s security is the responsibility of the Israeli security officials.
  • 15. No single solution for all languages Number agreement: the house / the houses vs. la maison / les maisons Gender agreement: the house / the cheese vs. la maison / le frommage English - Spanish English - French
  • 16. No single solution for all languages English - German English - Chinese 种水果的农民 The farmer who grows fruit [Lit: “grow fruit (particle) farmer”]
  • 17. Not all languages are created equal French German Turkish Finnish Spanish Chinese Korean Hungarian Portuguese Japanese Thai Basque
  • 18. The Challenge of Patents L is an organic group selected from -CH2- (OCH2CH2)n-, -CO-NR'-, with R'=H or C1-C4 alkyl group; n=0-8; Y=F, CF3 … maximum stress of 1.2 to 3.5 N/mm<2> and a maximum elongation of 700 to 1,300% at 0[deg.] C. Long Sentences Technical constructions Largest single document: 249,322 words Longest Sentence: 1,417 words
  • 19. The Challenge of Patents Very long sentences as standard Grammatically incomplete using nominal and telegraphic style (!) Passive forms are frequent Frequent use of subordinate clauses, participles, implicit constructs Inconsistent and incorrect spelling High use of neologisms Instances of synonymy and polysemy Spurious use of punctuation Authoring guide for “to be translated” text Patents break almost all of the rules!
  • 20. Judge the quality of an MT system by comparing its output against a human-produced “reference” translation  Pros: Quick, cheap, consistent  Cons: Inflexible, cannot be used on ‘new’ input  Pros: Reliable, flexible, multi-faceted (fluency, error analyses, benchmarking)  Cons: Slow, expensive, subjective  Fluency vs. Adequacy Evaluating Machine Translation Quality Automatic Evaluation Human Evaluation Task-Based Evaluation
  • 21. Evaluating Machine Translation Quality Task Based Evaluation  Standalone evaluation of MT systems is necessary to get a sense of the overall quality of a system  To determine the ultimate usability of an MT system, intrinsic task-based evaluation is required  Why? Fluency vs. Adequacy Fluency how fluent and grammatically correct the translation output is Adequacy how accurately the translation conveys the meaning of the source Output 1 The big blue house Output 2 The big house red Source La gran casa roja Task-Based Evaluation
  • 22. Practical uses of Machine Translation Understand its limitations and you’ll understand its capabilities! No  Translate a patent for filing  Translate literature for publication  Translate marketing materials  Anything mission critical without review Yes  Productivity tool for professional translation  Understand foreign patents  Localisation processes and “controlled’ content  High volume, e.g. eDiscovery
  • 23. Use cases in practice Product descriptions to open new markets MT for post-editing productivity across industries Developer, and user for web content Tens of thousands of people using online tools daily
  • 24. Neural Networks  Using artificial intelligence and deep learning to develop a completely new way of doing machine translation! Quality Estimation  Functionality through which machine translation can “self- assess” the quality of the translations it produces. Online Adaptive Translation  Machine translations that can automatically learn and improve based on feedback, particularly from revisions. Use-case specific MT  Just like patent MT, but for countless other areas. Current Hot Topics
  • 25. About Iconic We are a Machine Translation and Natural Language Processing software and services provider, delivering expert solutions with Subject Matter Expertise
  • 28.
  • 29.
  • 30. Speed, Cost, and Quality What is the difference between machine translation vs. manual translation when translating a 10 page patent document from Chinese into English? Machine Translation is not designed to replace professional translation but there are many cases where costly and time- consuming manual translation is simply not necessary.
  • 31. - Data confidentiality - File formats - Potential for customisation, enhancements, and improvement for specific domains
  • 32. More than just translation DATA PROCESSING E.G. OPTICAL CHARACTER RECOGNITION, DIGITISATION DATABASE BUILDING E.G. COMBINING THE ABOVE, WITH TRANSLATION, FOR EXPORT DATA UNDERSTANDING E.G. SUMMARISATION, CONCEPT & KEY TERM IDENTIFICATION INFORMATION EXTRACTION E.G. CITATION ANALYSIS, CROSS- LINGUAL SEARCH
  • 33. Record Extraction Extraction algorithms work on cleaned OCR output, using patterns, keywords, and formatting information.
  • 34. Citation Analysis Assessment of record and reference patterns Application for record extraction Tracking variations across years Application for bibliographic data fielding
  • 36. .com Visit and use the promo code epo2016 to get 20 free pages of translation

Notas del editor

  1. Second point is important. It has different uses and usability. The concept of FAHQMT is no more. Focus is now on HAMT and PEMT. Problems with rule-based is that they didn’t scale You need bilingual experts for each language pair SMT is the predominant approach
  2. Starting point for all systems is data. The most important aspect is the quality of the data…
  3. They are essential and the quality is crucial. The translations must be accurate and the alignment must be correct, otherwise we infer the wrong things. Introduce “noise” into our systems.
  4. How do we use these corpora? It’s all about learning and remembering things we’ve seen before, the same way you might go about translating something
  5. Ok, so the translation isn’t exactly right here. It should be “Je parle a la fille” but we haven’t seen enough examples (don’t have enough data) for reliable estimates, we’re just going on the counts of the words
  6. How likely a word is to translate to another word – as you have seen How likely the different phrases are to translate as one another What’s the likelihood a certain word will have a different position in the target sentence Sometimes we take into account linguistic information about the words, is it a verb, then it should go here, articles should proceed nouns, etc. Look at models of the target language and see if what we have produce makes sense (can these words go together in this order?)
  7. Google Translate aims to be a general system, but what happens when your translating a sports website? Quality issues can be caused by the fact that there’s a lot of other data in their models than sports news. Similarly, if I have a translation system for car manuals, it won’t be any good at translating sports websites. This is reflected in our systems at IPTranslator too where all of our models are built using patents which have been filed in multiple languages to ensure we get the style correct (patents are a bigger fish than this though)
  8. The simple answer is that language is complex! Which is what makes it difficult to learn but also so interesting at the same time! Who has the telescope, him or I? New words, especially in patents. And new usage of words. The verb “to tweet” didn’t exist so long ago…
  9. The last piece in the puzzle is understanding the languages you’re developing MT systems for. And that’s not understanding them in isolation – that’s understanding, for each language pair, what the differences are between them, e.g. many of the things we need to look out for when developing English-Spanish translation engines we don’t need to do for French-Spanish translation
  10. With certain language pairs, things get more complex. The processes that we need to develop are harder to develop, less studied, require smarter people! Chinese, need to identify these DE constructions so we know to move the head noun No tense, going into English, how do we know what tense? There’s no article! We have to generate it! DE particle has many translations, which one! FIRST THINGS FIRST, which ones are the words!? We need to segment the Chinese! ONLY WITH THESE SKILLS CAN YOU EXPLOIT THE TECHNOLOGY TO ITS FULLEST – AND WHAT DO WE GET IN DOING THIS? MT WITH SUBJECT MATTER EXPERTISE
  11. **EFFECT ON FEASIBILITY** Basically, some languages are easier for MT that others. General rule, closer two languages are to one another in terms of word order, grammatical structure, the easier. Here’s some rules of thumb (with English)
  12. But of course it’s not just that easy. Patents for example have a range of highly complex linguistic characteristics that make this challenging, both for PROFESSIONAL translators as well as for Translation Software. Lets look for example at this patent – what’s highlighted in blue is a SINGLE sentence, (which is an individual legal claim). Additionally, we have to deal with complex technical constructions such as chemical formulae, alphanumeric sequences, even genomic and amino acid sequences. And then we have patents which introduce a whole new level of complexity on top of the language issues… Patents are hard to read, never mind translate, never mind try to teach a computer how to translate them!
  13. Sometimes it’s hard to tell whether the translation is bad or that’s simply how the original patent was written
  14. Commercial machine translation is plagued with misleading marketing with unrealistic claims and promises - Need to manage expectations When I say NO, I mean no in a fully-automatic manner with no human intervention Filing – not when meaning is CRUCIAL Publication – no, there will be errors Marketing – no, not with subtleties, idioms, etc.
  15. MT solutions and services provider, specializing in providing customised solutions with subject matter expertise for specific techincal sectors, such as Patents/IP, life sciences, and financial. We are the MT partner of choice for some of the world’s largest translation companies, information providers, and government and enterprise organisations. For Translation Companies: We help translation companies to translate more content, more accurately for faster project turnaround, resulting in significant cost savings and increased revenue. For Enterprise Clients: We help enterprises to translate more content in less time, resulting in faster products to market and enhanced global reach. For Information Providers: We help information providers to translate knowledge, literature and documentary information faster and more accurately, resulting in broader knowledge offerings and faster time to market.
  16. THERE’S VALUE TO BE ADDED, HOW CAN WE HARNESS? We literally already have the perfect environment to allow NMT to be another string in the bow and let us use the most appropriate MT for the job WHETHER IT BE NEURAL FOR KOREAN, FOR CHAT TEXT, OR WHATEVER THE CASE MAY BE
  17. It’s not a one size fits all solution and who knows when it will be, but we have developed a framework that allows us to leverage it’s strength on a case by case basis to deliver the best possible translation for a given task. Overtime we fully expect the “brain to grow” and become the best MT on offer for various language pairs and content types, and when it is, WE”RE PERFECTLY POSITIONS FROM A TECHINCOLOGY AND EXPERTISE PERSPECTIVE to capitalise on this wave.
  18. We’ve launched a new product this year which is essentially repurposing the technology that we have and focusing on very particular use cases… Firstly, let’s just look at the stark motivation for using MT for patent information in the first place…
  19. The “standard” solution to the problem of foreign language documents is translations. But translation is costly, not that quick, and often it is complete overkill for what is required!! This is where MT comes as a much more cost-effective, rapid solution that allow you to make a QUICK determination as to whether something is relevant or not before you invest in a professional translation. And, while we all know that MT isn’t perfect, the reality of the situation is that the quality is often “good enough” or fit for the purpose of make this determiniation. SO IT’S A NO-BRAINER
  20. So going back to IPTranslator, the elephant in the room for us for a long time has been Google Translate. The first question we get asked always is “is it better than Google Translate?” The answer is yes, the majority of the time for most of the languages that we cover. However, is that increase in quality enough to justify the cost of our server over Google which is a free service? It’s hard to beat free! The reality is now, the “fit for purpose / good enough” quality level is something that Google can achieve often, especially since it started working with the EPO. So where does IPTranslator fit? Confidential Data File formats incl. pdf Potential for customisation, enhancements, and improvement for specific domains
  21. Not just for patents, but for journals and other non-patent literature
  22. Why was it challenging? Exceptions to patterns OCR errors Lack of formatting information
  23. The record extraction example is from Pattern B The bib data example is from Pattern 5