SlideShare una empresa de Scribd logo
1 de 34
Build Your Own
 Statistical Machine
Translation Engines
            Ruben de la Fuente
About Me

• 4-year degree in translation
• Worked as translator for 10+ years
• Working full time in MT for the past
  year
Agenda


•   Quick comparison with RbMT
•   Fundamentals of SMT
•   Requirements and preparation
•   Using DoMY
Disclaimer



• I’m not saying SMT is better
• I’m not saying SMT is right for you
Statistical Machine Translation

Computer learns to translate through
statistical analysis of alignment in
bilingual corpora
Rule-based Machine Translation

User Dictionaries + Grammar and
translation rules
SMT: Pros and Cons
Pros              Cons

Quick to build    Unpredictable
Cheap             Quick
Fluent            improvements not
                  easy
Features of an SMT system

• Translation Model: table containing
  source and target phrases, together
  with a probability score (accuracy)
• Language Model: list of sequences of
  n-words in target language together
  with a probability score (fluency)
Language and Translation Models
• LM (fluency)     • TM (accuracy)
Tokenization and recasing
Breaking up text in        Lowercase all words
meaningul units (tokens)
                           File > file
                           file? > file ?
                           file. > file .
                           File! > file !
Requirements: Computing


•4 GB RAM PC needed
•Ubuntu 10.04 64-bit OS
•Virtual Machine OK
Requirements: size

MS Translator Hub recommends at least
10k segments
I have gotten good results with 100-200k
segments
Roughly over 1 million words corpus
Publicly Available Corpora

• Opus (ECB, EMA, OpenOffice)
• Acquis Communautaire
• Europarl
• Hansard
• Multilingual websites: Bitextor
Bitextor is Cunning

www.mywebsite.com/en/overview.html
www.mywebsite.com/es/overview.html
<title>My source text</title>
<title>My target text</title>
Requirements: relevance


Data needs to be in-domain
Requirements: quality

Garbage in, garbage out
Diagnose your TMs with automated QA
checks (e.g. glossary adherence, length)
CheckMate: General
CheckMate: Length
CheckMate: Terminology
Remove Repetitions
Remove Markup

Markup brings noise to the learning
process
Click <strong>Send</strong>
Haga clic en <strong>Enviar</strong>
Do-Moses-Yourself (DoMY)

Moses: state-of-the-art extensively used
open source SMT toolkit
DoMY: extension of Moses making
installation and configuration easier
Online SMT Portals
                  Cons
letsmt.eu
                  NDA-compliance
smartmate.co      Availability
                  Speed
DoMY (Basics)

Graphs: import-tmx, clean-LM/TM, build
LM/TM, train, translate.
Ini files: configuration (language pairs,
paths for input and output).
Folder structure: always include
superdomain, domain and subdomain
Folder structure
corpus           graphs
Run from terminal
Edit ini            Command line
Running from GUI
Graphs
Graph        Function             Input       Output
Import-tmx   Extract data from    Raw         Corpora/sa
             tmx files
Clean-tm     Clean data           Corpora/sa Corpora/re
                                             ady
Build-lm     Prepares training    Corpora/re builds
             set for LM           ady
Build-tm     Prepares training    Corpora/re builds
             set for TM           ady

Train        Trains MT engine     Builds      engines
Translate    Translates input     Translation Translation
             files and produces   s/in        s/out
             tmx output
Tips for settings

LM: 7-gram
TM: 9-gram
Aligner: Berkeley for distant languages
Troubleshoot

Error message in terminal
Log file in graph folder
DoMT QA
Is Your Engine Good?

A set is excluded from training to be used
for evaluation (598 segments)
From 0.5 BLEU points, engine is likely to
perform well
Keep Improving

Retrain the engine periodically as more
translation corpus become available
Gather feedback on what needs to be
improved
Statistical PE

• Keep a corpus of raw vs. PE
• Treat them as separate language pairs
• Run them thru DoMY
• Create raw vs. PE engine
• 2 engines: source > target, raw > PE
Questions?
Speak now…
Or reach me at:
www.facebook.com/xlation
www.wordbonds.es
@rubendelafuente
http://www.linkedin.com/in/rubendelafuente

Más contenido relacionado

Similar a Build your own statistical engines

SDL BeGlobal The SDL Platform for Automated Translation
SDL BeGlobal The SDL Platform for Automated TranslationSDL BeGlobal The SDL Platform for Automated Translation
SDL BeGlobal The SDL Platform for Automated TranslationSDL Trados
 
Tms days 04 2012 manuel herranz pangea mt
Tms days 04 2012 manuel herranz pangea mtTms days 04 2012 manuel herranz pangea mt
Tms days 04 2012 manuel herranz pangea mtManuel Herranz
 
Putting Compilers to Work
Putting Compilers to WorkPutting Compilers to Work
Putting Compilers to WorkSingleStore
 
New Breakthroughs in Machine Transation Technology
New Breakthroughs in Machine Transation TechnologyNew Breakthroughs in Machine Transation Technology
New Breakthroughs in Machine Transation Technologykantanmt
 
Lexcelera MT Breaking Compromises
Lexcelera MT Breaking CompromisesLexcelera MT Breaking Compromises
Lexcelera MT Breaking CompromisesLoriThicke
 
TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...
TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...
TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...TAUS - The Language Data Network
 
System Programing Unit 1
System Programing Unit 1System Programing Unit 1
System Programing Unit 1Manoj Patil
 
Evaluation of MT Quality/Productivity at eBay - AMTA 2018
Evaluation of MT Quality/Productivity at eBay - AMTA 2018Evaluation of MT Quality/Productivity at eBay - AMTA 2018
Evaluation of MT Quality/Productivity at eBay - AMTA 2018Jose Luis Bonilla Sánchez
 
Alchemy Catalyst Automation
Alchemy Catalyst AutomationAlchemy Catalyst Automation
Alchemy Catalyst AutomationShamusd
 
5 challenges of scaling l10n workflows KantanMT/bmmt webinar
5 challenges of scaling l10n workflows KantanMT/bmmt webinar5 challenges of scaling l10n workflows KantanMT/bmmt webinar
5 challenges of scaling l10n workflows KantanMT/bmmt webinarkantanmt
 
computer architecture and organization.ppt
computer architecture and organization.pptcomputer architecture and organization.ppt
computer architecture and organization.pptmuhammadosama0121
 
EAMT Workshop 2015 - KantanMT
EAMT Workshop 2015 - KantanMTEAMT Workshop 2015 - KantanMT
EAMT Workshop 2015 - KantanMTkantanmt
 
unit1pdf__2021_12_14_12_37_34.pdf
unit1pdf__2021_12_14_12_37_34.pdfunit1pdf__2021_12_14_12_37_34.pdf
unit1pdf__2021_12_14_12_37_34.pdfDrIsikoIsaac
 
A Safer's Guide to Best Practices for Optimizing Jobs on FME Server
A Safer's Guide to Best Practices for Optimizing Jobs on FME ServerA Safer's Guide to Best Practices for Optimizing Jobs on FME Server
A Safer's Guide to Best Practices for Optimizing Jobs on FME ServerSafe Software
 
Computer organization basics
Computer organization  basicsComputer organization  basics
Computer organization basicsDeepak John
 
Design Like a Pro: Scripting Best Practices
Design Like a Pro: Scripting Best PracticesDesign Like a Pro: Scripting Best Practices
Design Like a Pro: Scripting Best PracticesInductive Automation
 
Design Like a Pro: Scripting Best Practices
Design Like a Pro: Scripting Best PracticesDesign Like a Pro: Scripting Best Practices
Design Like a Pro: Scripting Best PracticesInductive Automation
 
Compiler Design Introduction
Compiler Design Introduction Compiler Design Introduction
Compiler Design Introduction Thapar Institute
 
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...TAUS - The Language Data Network
 
A Safer's Guide to Best Practices for Optimizing Jobs on FME Server
A Safer's Guide to Best Practices for Optimizing Jobs on FME ServerA Safer's Guide to Best Practices for Optimizing Jobs on FME Server
A Safer's Guide to Best Practices for Optimizing Jobs on FME ServerSafe Software
 

Similar a Build your own statistical engines (20)

SDL BeGlobal The SDL Platform for Automated Translation
SDL BeGlobal The SDL Platform for Automated TranslationSDL BeGlobal The SDL Platform for Automated Translation
SDL BeGlobal The SDL Platform for Automated Translation
 
Tms days 04 2012 manuel herranz pangea mt
Tms days 04 2012 manuel herranz pangea mtTms days 04 2012 manuel herranz pangea mt
Tms days 04 2012 manuel herranz pangea mt
 
Putting Compilers to Work
Putting Compilers to WorkPutting Compilers to Work
Putting Compilers to Work
 
New Breakthroughs in Machine Transation Technology
New Breakthroughs in Machine Transation TechnologyNew Breakthroughs in Machine Transation Technology
New Breakthroughs in Machine Transation Technology
 
Lexcelera MT Breaking Compromises
Lexcelera MT Breaking CompromisesLexcelera MT Breaking Compromises
Lexcelera MT Breaking Compromises
 
TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...
TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...
TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...
 
System Programing Unit 1
System Programing Unit 1System Programing Unit 1
System Programing Unit 1
 
Evaluation of MT Quality/Productivity at eBay - AMTA 2018
Evaluation of MT Quality/Productivity at eBay - AMTA 2018Evaluation of MT Quality/Productivity at eBay - AMTA 2018
Evaluation of MT Quality/Productivity at eBay - AMTA 2018
 
Alchemy Catalyst Automation
Alchemy Catalyst AutomationAlchemy Catalyst Automation
Alchemy Catalyst Automation
 
5 challenges of scaling l10n workflows KantanMT/bmmt webinar
5 challenges of scaling l10n workflows KantanMT/bmmt webinar5 challenges of scaling l10n workflows KantanMT/bmmt webinar
5 challenges of scaling l10n workflows KantanMT/bmmt webinar
 
computer architecture and organization.ppt
computer architecture and organization.pptcomputer architecture and organization.ppt
computer architecture and organization.ppt
 
EAMT Workshop 2015 - KantanMT
EAMT Workshop 2015 - KantanMTEAMT Workshop 2015 - KantanMT
EAMT Workshop 2015 - KantanMT
 
unit1pdf__2021_12_14_12_37_34.pdf
unit1pdf__2021_12_14_12_37_34.pdfunit1pdf__2021_12_14_12_37_34.pdf
unit1pdf__2021_12_14_12_37_34.pdf
 
A Safer's Guide to Best Practices for Optimizing Jobs on FME Server
A Safer's Guide to Best Practices for Optimizing Jobs on FME ServerA Safer's Guide to Best Practices for Optimizing Jobs on FME Server
A Safer's Guide to Best Practices for Optimizing Jobs on FME Server
 
Computer organization basics
Computer organization  basicsComputer organization  basics
Computer organization basics
 
Design Like a Pro: Scripting Best Practices
Design Like a Pro: Scripting Best PracticesDesign Like a Pro: Scripting Best Practices
Design Like a Pro: Scripting Best Practices
 
Design Like a Pro: Scripting Best Practices
Design Like a Pro: Scripting Best PracticesDesign Like a Pro: Scripting Best Practices
Design Like a Pro: Scripting Best Practices
 
Compiler Design Introduction
Compiler Design Introduction Compiler Design Introduction
Compiler Design Introduction
 
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...
 
A Safer's Guide to Best Practices for Optimizing Jobs on FME Server
A Safer's Guide to Best Practices for Optimizing Jobs on FME ServerA Safer's Guide to Best Practices for Optimizing Jobs on FME Server
A Safer's Guide to Best Practices for Optimizing Jobs on FME Server
 

Más de Rubén Rodríguez de la Fuente (13)

¿Me entiende el ordenador cuando hablo?
¿Me entiende el ordenador cuando hablo?¿Me entiende el ordenador cuando hablo?
¿Me entiende el ordenador cuando hablo?
 
Tips and tricks for PE
Tips and tricks for PETips and tricks for PE
Tips and tricks for PE
 
Trados studio 09 gestores
Trados studio 09 gestoresTrados studio 09 gestores
Trados studio 09 gestores
 
Trados studio 09 traductores
Trados studio 09 traductoresTrados studio 09 traductores
Trados studio 09 traductores
 
Presencia internet
Presencia internetPresencia internet
Presencia internet
 
Resources for translators
Resources for translatorsResources for translators
Resources for translators
 
L10 n case study
L10 n case studyL10 n case study
L10 n case study
 
Trayectoria ruben
Trayectoria rubenTrayectoria ruben
Trayectoria ruben
 
El traductor en plantilla
El traductor en plantillaEl traductor en plantilla
El traductor en plantilla
 
Presencia internet
Presencia internetPresencia internet
Presencia internet
 
Translators on the go
Translators on the go Translators on the go
Translators on the go
 
Taller de traducción automática
Taller de traducción automáticaTaller de traducción automática
Taller de traducción automática
 
FOSS4XL8Rs
FOSS4XL8RsFOSS4XL8Rs
FOSS4XL8Rs
 

Último

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 

Último (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

Build your own statistical engines

Notas del editor

  1. Why? SMT is based in probability, calculated as # of a given token / total amount of tokens. Case and punctuation can disrupt the calculation.
  2. To get good results with SMT, you need around 10.000 segments at least
  3. Using Olifant from Okapi Framework
  4. Clean data: remove too long/short, empty sentences