SlideShare una empresa de Scribd logo
1 de 29
Descargar para leer sin conexión
DISINTERMEDIATION,AUTOMATION,
CENTRALISATION, Machine Translation :
THE FUTURE OF LSCS
Manuel Herranz, CEO
A few words about…Manuel Herranz
Majored in mechanical engineering &
languages (UK)
Joined Giddings & Lewis - Ford
Valencia / Chihuahua 1993 - 1996
Rolls Royce Marine & Industrial
Spain / Argentina 1997-98 and 2000
Joined B.I Corporation Japan 1996-2004
Friendly buy-out 2005: Pangeanic
What we do as an industry
Fight in a segmented market Enable international business
Help people /organizations to
communicate
Innovate
Differentiate?
At what
speed?
Really?
How is your target market setup?
More importantly…
Are you ahead of the game?
95% LSCs have no iSEO strategy (beyond having a bilingual website) becau
translation is expensive. Do you invest in the product you sell?
80% have no national SEO strategy
50% apply /adopt MT (only 25% have MT
embedded in their systems / custom engines)
10% have centralized TM system to leverage
past content.
Most operate with hundreds of TMs in a server.
Business model in 5 years?
Disintermediation – these
companies re-invented
business models
Or offer added value
vs
LSCs become language recruitment agencies/ HHRR specialists?
Success: Basic business proposition + value
Industry revenues more than doubled from ~$19B in 2005 to ~$40B
in 2016. [Common Sense Advisory data]
Translation buyers worry about ever-growing content volumes and
more language pairs – but with stable or shrinking budgets.
Management expects Amazon, Microsoft or Google Translate will
take care of “language problems” one day less complexity, lower
cost.
The US Census shows that translation workers doubled since 2008.
[Slator, May 24, 2017]
Automation: project management value replaced by bots?
The number of working translators on LinkedIn has increased by
50+% since 2010 [LinkedIn data]
Market forces squeeze mid-sized companies from both ends: large
can offer economies of scale. Small are specialists, niche or local.
Business model in 5, 4, 3 years…?
Disintermediation / Direct clients
Satisfying the 5Bn searches/day and
increasing demand for cheaper
language services
TM+MT leveraging, CAT agnostic,
inexpensive tools
When Google was founded in September
1998, it was serving ten thousand search
queries per day (by the end of 2006 that
same amount would be served in a single
second)
Affordable
Efficiency areas and growth hacks
Centralised TMs for advanced leveraging (ActivaTM)
The web as a sales tool (SEO, SEM) online ordering system (Cor).
Machine Translation: SMT -> NMT … some astounding results.
Winning iADDATPA: largest EC MT infrastructure project linking MT
vendors to Public Administrations.
Efficiency areas and growth hacks
The Pangeanic Experience
The Database
Elastic Search-based
All language assets in one
database, irrespective of tool
that created them
Deep learning for tag handling
CAT-tool agnostic (solves
interoperability issues)
Automatic fuzzy match repair
Matrix
(triangulate to create new language
pairs)
Statistics on all segment units,
words, domains
Remote access, API
Pre-filter prior to MT (TM+MT)
More powerful (strict) fuzzy matchin
g than traditional CAT-tools
Saved +14%
in fuzzy matches
The Database
National research project with EU funding
Full platform
Use by Pangeanic, LSPs, 3rd parties
Eases estimation and automates workflows in
any translation format (doc or web)
CMS agnostic – extracts text and converts to
xliff (doc or web)
Translate sections of a web only (batches)
Detect new content or content that has been
eliminated to update language versions
The Web
The Web
The Web
Neural Machine Translation - background
Artificial Neural Networks for SMT
History of ANN-based Machine Translation and
Language Modelling for SMT:
1997 [Castano & Casacuberta 97] (JAUME I &
U.Politécnica): Machine translation using neural
networks and finite-state models
(PangeaMT: https://www.prhlt.upv.es/wp/research-
areas/mt-showcase)
2007 [Schwenk & Costa-jussa 07]: Smooth
bilingual n-gram translation.
2012 [Le & Allauzen 12, Schwenk 12]: Continuous
space translation models with neural networks.
2014 [Devlin & Zbib 14]: Fast and robust neural
networks for SMT
Conventional SMT
Use of statistics has been controversial in
computational linguistics:
Chomsky 1969: ... the notion ’probability of a
sentence’ is an entirely useless one, under any
known interpretation of this term.
Considered to be true by most experts in (rule-
based) natural language processing and artificial
intelligence
History of Statistical Approach to MT
1989-94: IBM’s pioneering work
since 1996: only a few teams favored SMT:
U.Politécnica Valencia, RWTH Aachen, HKUST,
CMU
2006/2007 Google Translate
2006-2012 Euromatrix
2009: PangeaMT
2016: First trials in NMT
2017: European Commission: iADDATPA
project
CMS 2
CMS 1Tilde MT
Pangeanic
KantanMT
AT systems
IADAAPTA Platform
(cloud vs on-premise)
CKAN
Widget
browser
eTranslation
- Requests: Supporting both synchronous and
asynchronous requests
- Many IADAAPTA deployments are possible.
- A global instance register is kept by commercial partner.
- send translation request
- receive webhook
- Ask for “request done”
Priority
Admin
Lang / Q
router
User
management/
Profile
BACKOFFICE:
- Global (instance management)
- Individual (for each instance)
- AT systems receive webhook
- Ask for “content request”
Documents
(proprietary
formats)
Conversion
e-Sens
AS4
Profile
Complia
nt)
e-Sens
AS4
Profile
Complia
nt)
Prompsit
iADAATPA:
Neural Machine Translation – Will it work?
Text out
Text in
This is more realistic: MT in the wild, wild, wild world
Quite an accurate workflow when integrating MT at a company
MT engine
Neural Machine Translation –
Is there a future for translation services?
Machine translation will displace only those humans who translate like machines.
(The remaining) translators will focus on tasks that require intelligence.
- Arle Lommel, 2012
Neural Machine Translation
Tests in F/I/G/S, RU, PT point to a very strong preference towards NMT fluency bit.ly/neural-machine-translation-pangeanic.
On average: from a set of 250 sentences, around 85%-92% were good or very good (A or B). ES/PT/IT results similar to FR
Evaluation: Translation companies and professional freelance translators
EN-DE set of 250 sentences
NMT SMT
A 132 53% 34 14%
B 98 39% 95 38%
C 14 6% 97 39%
D 6 2% 24 10%
250 250
EN-FR set of 250 sentences
NMT SMT
A 150 60% 39 16%
B 76 30% 126 50%
C 21 71 28%
D 3 14 6%
EN-RU set of 250 sentences
NMT SMT
A 128 51% 39 16%
B 84 34% 43 17%
C 22 9% 60 24%
D 16 6% 108 43%
250 250
Neural Machine Translation Class “CAT” is
selected as it got
the highest value
Feed Forward Neural Machine Translation
Feed Forward Neural Machine Translation
Training set Test
Reference
translation
Out of which we
take 2000
sentences to try
the system with in-
domain text (a
typical sentence
the system may
encounter in the
future)
Remove any protocol configuration files that are not used for the
specified protocol .
These tables are sometimes referred to as " no sync " tables .
This chapter will describe many of those pages and parameters .
Feed Forward Neural Machine Translation
Error function
(detects “wrong
match”)
Input Query
Label
(data we already know)
Output
Update function, ie
(the “learning process”)
New Weights (W)
+
New Bias (B)
And after many, many training sessions detecting
patterns, trial and errors and feedback loops…
Feed Forward Neural Machine Translation
Label
(data we already
know)
Output
Error
function
(detects
“wrong
match”)
Input
Query
No
Update !!!
(“learning
process”
completed)
Now we
have a
system!!!! Input
Queries
Output
Labels
80%-85%
accuracy!!
Recurrent NMT + Attention Models
Attention models tell the system which
encoder states to look at
a good and sound agreement un buen y sólido acuerdo
un buen y sólido acuerdo
<s>
<s>
Neural Machine Translation – Rates
30% price drop
40%-75% salary increase
for post-editors
Open Questions
Are you working in the same way as 5 years ago? Do you
think you will be working in the same way in 2023?
Translation companies will remain providing translation
services only?
New business models: offer translation order automation
(management systems), disintermediation, raw MT
services?
Will large translation companies consolidate and
dominate globally? Can new players emerge with the
right tools, selling globally?
Is translation company-to-translation company selling a
viable model?
Thank you!
Manuel Herranz
m.herranz@pangeanic.com / m.herranz@pangeamt.com
Twitter: manuelhrrnz

Más contenido relacionado

Similar a Pangeanic presentation at Elia Together Athens - Manuel Herranz

Tms days 04 2012 manuel herranz pangea mt
Tms days 04 2012 manuel herranz pangea mtTms days 04 2012 manuel herranz pangea mt
Tms days 04 2012 manuel herranz pangea mtManuel Herranz
 
New Breakthroughs in Machine Transation Technology
New Breakthroughs in Machine Transation TechnologyNew Breakthroughs in Machine Transation Technology
New Breakthroughs in Machine Transation Technologykantanmt
 
ICIC 2013 New Product Introductions CEPT
ICIC 2013 New Product Introductions CEPTICIC 2013 New Product Introductions CEPT
ICIC 2013 New Product Introductions CEPTDr. Haxel Consult
 
Diversity In Localization (Olga Melnikova)
Diversity In Localization (Olga Melnikova)Diversity In Localization (Olga Melnikova)
Diversity In Localization (Olga Melnikova)Olga Melnikova
 
Gestión proyectos traducción - Universitat Autònoma de Barcelona
Gestión proyectos traducción - Universitat Autònoma de BarcelonaGestión proyectos traducción - Universitat Autònoma de Barcelona
Gestión proyectos traducción - Universitat Autònoma de BarcelonaManuel Herranz
 
Gestión proyectos traducción en la Universitat Autònoma de Barcelona
Gestión proyectos traducción en la Universitat Autònoma de BarcelonaGestión proyectos traducción en la Universitat Autònoma de Barcelona
Gestión proyectos traducción en la Universitat Autònoma de BarcelonaManuel Herranz
 
(Recent) technology trends and bridges to gap in the localization industry
(Recent) technology trends and bridges to gap in the localization industry(Recent) technology trends and bridges to gap in the localization industry
(Recent) technology trends and bridges to gap in the localization industryLoctimize GmbH
 
MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)
MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)
MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)TAUS - The Language Data Network
 
How to Improve Translation Productivity
How to Improve Translation ProductivityHow to Improve Translation Productivity
How to Improve Translation Productivitykantanmt
 
FME 2020.0 Sneak Peek
FME 2020.0 Sneak PeekFME 2020.0 Sneak Peek
FME 2020.0 Sneak PeekSafe Software
 
Methods for Handling Terminology in Machine Translation
Methods for Handling Terminology in Machine TranslationMethods for Handling Terminology in Machine Translation
Methods for Handling Terminology in Machine TranslationKerstin Berns
 
How to Achieve Agile Localization for High-Volume Content with Machine Transl...
How to Achieve Agile Localization for High-Volume Content with Machine Transl...How to Achieve Agile Localization for High-Volume Content with Machine Transl...
How to Achieve Agile Localization for High-Volume Content with Machine Transl...kantanmt
 
MOND Semantics Integration
MOND Semantics IntegrationMOND Semantics Integration
MOND Semantics IntegrationSales Emea
 
Mammothdb - Public VC Pitchdeck!
Mammothdb - Public VC Pitchdeck!Mammothdb - Public VC Pitchdeck!
Mammothdb - Public VC Pitchdeck!Steve Keil
 
DCXS best selfcare-solutions DynamicFAQ
DCXS best selfcare-solutions DynamicFAQDCXS best selfcare-solutions DynamicFAQ
DCXS best selfcare-solutions DynamicFAQLilianBernardin
 

Similar a Pangeanic presentation at Elia Together Athens - Manuel Herranz (20)

Tms days 04 2012 manuel herranz pangea mt
Tms days 04 2012 manuel herranz pangea mtTms days 04 2012 manuel herranz pangea mt
Tms days 04 2012 manuel herranz pangea mt
 
CAT TOOLS.ppt
CAT TOOLS.pptCAT TOOLS.ppt
CAT TOOLS.ppt
 
KantanMT
KantanMT KantanMT
KantanMT
 
New Breakthroughs in Machine Transation Technology
New Breakthroughs in Machine Transation TechnologyNew Breakthroughs in Machine Transation Technology
New Breakthroughs in Machine Transation Technology
 
ICIC 2013 New Product Introductions CEPT
ICIC 2013 New Product Introductions CEPTICIC 2013 New Product Introductions CEPT
ICIC 2013 New Product Introductions CEPT
 
Diversity In Localization (Olga Melnikova)
Diversity In Localization (Olga Melnikova)Diversity In Localization (Olga Melnikova)
Diversity In Localization (Olga Melnikova)
 
Gestión proyectos traducción - Universitat Autònoma de Barcelona
Gestión proyectos traducción - Universitat Autònoma de BarcelonaGestión proyectos traducción - Universitat Autònoma de Barcelona
Gestión proyectos traducción - Universitat Autònoma de Barcelona
 
Gestión proyectos traducción en la Universitat Autònoma de Barcelona
Gestión proyectos traducción en la Universitat Autònoma de BarcelonaGestión proyectos traducción en la Universitat Autònoma de Barcelona
Gestión proyectos traducción en la Universitat Autònoma de Barcelona
 
(Recent) technology trends and bridges to gap in the localization industry
(Recent) technology trends and bridges to gap in the localization industry(Recent) technology trends and bridges to gap in the localization industry
(Recent) technology trends and bridges to gap in the localization industry
 
Jtf new
Jtf newJtf new
Jtf new
 
MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)
MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)
MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)
 
How to Improve Translation Productivity
How to Improve Translation ProductivityHow to Improve Translation Productivity
How to Improve Translation Productivity
 
FME 2020.0 Sneak Peek
FME 2020.0 Sneak PeekFME 2020.0 Sneak Peek
FME 2020.0 Sneak Peek
 
Jtf new
Jtf newJtf new
Jtf new
 
Methods for Handling Terminology in Machine Translation
Methods for Handling Terminology in Machine TranslationMethods for Handling Terminology in Machine Translation
Methods for Handling Terminology in Machine Translation
 
How to Achieve Agile Localization for High-Volume Content with Machine Transl...
How to Achieve Agile Localization for High-Volume Content with Machine Transl...How to Achieve Agile Localization for High-Volume Content with Machine Transl...
How to Achieve Agile Localization for High-Volume Content with Machine Transl...
 
Graeme fleetwood 0 c
Graeme fleetwood 0 cGraeme fleetwood 0 c
Graeme fleetwood 0 c
 
MOND Semantics Integration
MOND Semantics IntegrationMOND Semantics Integration
MOND Semantics Integration
 
Mammothdb - Public VC Pitchdeck!
Mammothdb - Public VC Pitchdeck!Mammothdb - Public VC Pitchdeck!
Mammothdb - Public VC Pitchdeck!
 
DCXS best selfcare-solutions DynamicFAQ
DCXS best selfcare-solutions DynamicFAQDCXS best selfcare-solutions DynamicFAQ
DCXS best selfcare-solutions DynamicFAQ
 

Último

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformWSO2
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaWSO2
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingWSO2
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 

Último (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 

Pangeanic presentation at Elia Together Athens - Manuel Herranz

  • 1. DISINTERMEDIATION,AUTOMATION, CENTRALISATION, Machine Translation : THE FUTURE OF LSCS Manuel Herranz, CEO
  • 2. A few words about…Manuel Herranz Majored in mechanical engineering & languages (UK) Joined Giddings & Lewis - Ford Valencia / Chihuahua 1993 - 1996 Rolls Royce Marine & Industrial Spain / Argentina 1997-98 and 2000 Joined B.I Corporation Japan 1996-2004 Friendly buy-out 2005: Pangeanic
  • 3. What we do as an industry Fight in a segmented market Enable international business Help people /organizations to communicate Innovate Differentiate? At what speed? Really?
  • 4. How is your target market setup?
  • 5. More importantly… Are you ahead of the game? 95% LSCs have no iSEO strategy (beyond having a bilingual website) becau translation is expensive. Do you invest in the product you sell? 80% have no national SEO strategy 50% apply /adopt MT (only 25% have MT embedded in their systems / custom engines) 10% have centralized TM system to leverage past content. Most operate with hundreds of TMs in a server.
  • 6. Business model in 5 years? Disintermediation – these companies re-invented business models Or offer added value vs LSCs become language recruitment agencies/ HHRR specialists? Success: Basic business proposition + value
  • 7. Industry revenues more than doubled from ~$19B in 2005 to ~$40B in 2016. [Common Sense Advisory data] Translation buyers worry about ever-growing content volumes and more language pairs – but with stable or shrinking budgets. Management expects Amazon, Microsoft or Google Translate will take care of “language problems” one day less complexity, lower cost. The US Census shows that translation workers doubled since 2008. [Slator, May 24, 2017] Automation: project management value replaced by bots? The number of working translators on LinkedIn has increased by 50+% since 2010 [LinkedIn data] Market forces squeeze mid-sized companies from both ends: large can offer economies of scale. Small are specialists, niche or local. Business model in 5, 4, 3 years…?
  • 8. Disintermediation / Direct clients Satisfying the 5Bn searches/day and increasing demand for cheaper language services TM+MT leveraging, CAT agnostic, inexpensive tools When Google was founded in September 1998, it was serving ten thousand search queries per day (by the end of 2006 that same amount would be served in a single second) Affordable Efficiency areas and growth hacks
  • 9. Centralised TMs for advanced leveraging (ActivaTM) The web as a sales tool (SEO, SEM) online ordering system (Cor). Machine Translation: SMT -> NMT … some astounding results. Winning iADDATPA: largest EC MT infrastructure project linking MT vendors to Public Administrations. Efficiency areas and growth hacks The Pangeanic Experience
  • 10. The Database Elastic Search-based All language assets in one database, irrespective of tool that created them Deep learning for tag handling CAT-tool agnostic (solves interoperability issues) Automatic fuzzy match repair Matrix (triangulate to create new language pairs) Statistics on all segment units, words, domains Remote access, API Pre-filter prior to MT (TM+MT) More powerful (strict) fuzzy matchin g than traditional CAT-tools Saved +14% in fuzzy matches
  • 12. National research project with EU funding Full platform Use by Pangeanic, LSPs, 3rd parties Eases estimation and automates workflows in any translation format (doc or web) CMS agnostic – extracts text and converts to xliff (doc or web) Translate sections of a web only (batches) Detect new content or content that has been eliminated to update language versions The Web
  • 15. Neural Machine Translation - background Artificial Neural Networks for SMT History of ANN-based Machine Translation and Language Modelling for SMT: 1997 [Castano & Casacuberta 97] (JAUME I & U.Politécnica): Machine translation using neural networks and finite-state models (PangeaMT: https://www.prhlt.upv.es/wp/research- areas/mt-showcase) 2007 [Schwenk & Costa-jussa 07]: Smooth bilingual n-gram translation. 2012 [Le & Allauzen 12, Schwenk 12]: Continuous space translation models with neural networks. 2014 [Devlin & Zbib 14]: Fast and robust neural networks for SMT Conventional SMT Use of statistics has been controversial in computational linguistics: Chomsky 1969: ... the notion ’probability of a sentence’ is an entirely useless one, under any known interpretation of this term. Considered to be true by most experts in (rule- based) natural language processing and artificial intelligence History of Statistical Approach to MT 1989-94: IBM’s pioneering work since 1996: only a few teams favored SMT: U.Politécnica Valencia, RWTH Aachen, HKUST, CMU 2006/2007 Google Translate 2006-2012 Euromatrix 2009: PangeaMT 2016: First trials in NMT 2017: European Commission: iADDATPA project
  • 16. CMS 2 CMS 1Tilde MT Pangeanic KantanMT AT systems IADAAPTA Platform (cloud vs on-premise) CKAN Widget browser eTranslation - Requests: Supporting both synchronous and asynchronous requests - Many IADAAPTA deployments are possible. - A global instance register is kept by commercial partner. - send translation request - receive webhook - Ask for “request done” Priority Admin Lang / Q router User management/ Profile BACKOFFICE: - Global (instance management) - Individual (for each instance) - AT systems receive webhook - Ask for “content request” Documents (proprietary formats) Conversion e-Sens AS4 Profile Complia nt) e-Sens AS4 Profile Complia nt) Prompsit iADAATPA:
  • 17. Neural Machine Translation – Will it work? Text out Text in
  • 18. This is more realistic: MT in the wild, wild, wild world Quite an accurate workflow when integrating MT at a company MT engine
  • 19. Neural Machine Translation – Is there a future for translation services? Machine translation will displace only those humans who translate like machines. (The remaining) translators will focus on tasks that require intelligence. - Arle Lommel, 2012
  • 20. Neural Machine Translation Tests in F/I/G/S, RU, PT point to a very strong preference towards NMT fluency bit.ly/neural-machine-translation-pangeanic. On average: from a set of 250 sentences, around 85%-92% were good or very good (A or B). ES/PT/IT results similar to FR Evaluation: Translation companies and professional freelance translators EN-DE set of 250 sentences NMT SMT A 132 53% 34 14% B 98 39% 95 38% C 14 6% 97 39% D 6 2% 24 10% 250 250 EN-FR set of 250 sentences NMT SMT A 150 60% 39 16% B 76 30% 126 50% C 21 71 28% D 3 14 6% EN-RU set of 250 sentences NMT SMT A 128 51% 39 16% B 84 34% 43 17% C 22 9% 60 24% D 16 6% 108 43% 250 250
  • 21. Neural Machine Translation Class “CAT” is selected as it got the highest value
  • 22. Feed Forward Neural Machine Translation
  • 23. Feed Forward Neural Machine Translation Training set Test Reference translation Out of which we take 2000 sentences to try the system with in- domain text (a typical sentence the system may encounter in the future) Remove any protocol configuration files that are not used for the specified protocol . These tables are sometimes referred to as " no sync " tables . This chapter will describe many of those pages and parameters .
  • 24. Feed Forward Neural Machine Translation Error function (detects “wrong match”) Input Query Label (data we already know) Output Update function, ie (the “learning process”) New Weights (W) + New Bias (B) And after many, many training sessions detecting patterns, trial and errors and feedback loops…
  • 25. Feed Forward Neural Machine Translation Label (data we already know) Output Error function (detects “wrong match”) Input Query No Update !!! (“learning process” completed) Now we have a system!!!! Input Queries Output Labels 80%-85% accuracy!!
  • 26. Recurrent NMT + Attention Models Attention models tell the system which encoder states to look at a good and sound agreement un buen y sólido acuerdo un buen y sólido acuerdo <s> <s>
  • 27. Neural Machine Translation – Rates 30% price drop 40%-75% salary increase for post-editors
  • 28. Open Questions Are you working in the same way as 5 years ago? Do you think you will be working in the same way in 2023? Translation companies will remain providing translation services only? New business models: offer translation order automation (management systems), disintermediation, raw MT services? Will large translation companies consolidate and dominate globally? Can new players emerge with the right tools, selling globally? Is translation company-to-translation company selling a viable model?
  • 29. Thank you! Manuel Herranz m.herranz@pangeanic.com / m.herranz@pangeamt.com Twitter: manuelhrrnz