SlideShare una empresa de Scribd logo
1 de 58
Descargar para leer sin conexión
1
On Adaptive
COMPUTER-
ASSISTED
TRANSLATION
今
後
の
課
題玉引磚拋
1
⼋八楽
8 million spirits
joy
2
Outline
Full of trivial (embarrassing?) points
4
– Ed Hovy
“A plague of statistics has descended on our houses.”
5
e.g. 11,001 New Features for Statistical Machine Translation……
— George E. P. Box
“Essentially, all models are wrong, but some are useful.”
6
What went wrong?
7
First Brick in the Wall
• Via Negativa
• False positive/negative
• Error propagation
• Unknown unknown
9
Funny Autocomplete
“autocomplete is not a function” is current top-1 Google
autocomplete of “autocomplete is”.
10
Autocomplete is NOT a function
• Neither is auto-suggestion
• They are many-to-many relations with scores.
• Recognize this?
11
Many-to-many Scoring
• Map by prefix, rank by popularity
• Google search box autocomplete
• Map by occurrence, rank by similarity
• Search (information retrieval)
• Map by information, rank by knowledge
• Translation
12
Information?
• Surface patterns and……
• Imaginations
• Quantum information theory
• Tensor (Network Algorithm)
• Quantum Physics and Linguistics
• Frobenius (diagrammatic)
algebras (for semantics)
13
Knowledge?
Black swan……
OK, too philosophical now.
14
Popularity & Similarity
• Popularity: famous or infamous?
• Consensus: social choice?
• Similarity
• Distance: rational choice?
15
Prefix, Occurrence
• Surface pattern
• Regular
16
• Context-free
• Context-sensitive
• Recursively blahblah……
Map & Rank
• Regular expression
• Edit distance
17
Regular expression
• [a-z]+
• Colours of cats and dogs.
• [^o]{2}
• Colours of cats and dogs.
• cat|dog
• Colours of cats and dogs.
• Colou?rs?
• Colours of cats and dogs.
• Colors of cats and dogs.
• Color of a cat.
• <[A-Za-z][A-Za-z]*>
• <html>Colours of cats and dogs.</html>
18
Edit Distance
• Colors
• Delete s
• Color
• Insert u
• Colour
• Replace C with c
• colour
• Distance from Colors to colour: 3
(or 4 if the cost of replacing is 2)
19
– One may ask
“What if I wanted to map 1,1, one, and ONE?”
20
Normalization
• time flies like an arrow. fruit flies like bananas.
• Case restoration
• Time flies like an arrow. Fruit flies like bananas.
• Sentence segmentation
• time flies like an arrow.
• fruit flies like bananas.
• Word normalization: stemming or lemmatization?
21
Stemming
• Porter Stemmer (mainly suffix
stripping)
• flies → fli
• bananas → banana
• How about “flies → fly”?
• Lemmatization
22
Lemmatization
• flies → fly
• better → good
• meeting
• meet?
• axes
• axe?
• axis?
23
Stemming or lemmatization, which is better?
“Battlestar Galactica is frakking wierd.”
24
Are we doing good?
Evaluate it!
25
Confidence Score
• Confidence interval? Confidence level?
• Not really
• But it can be
• Just a buzz word from speech recognition
• Shannon’s game
• Hidden-Markov models
• Generative
• The Italian who went to Malta
• Can be any reasonable score
• Mostly probability
26
Calculate Sentence Similarity
Confident
Trusted
Doubted
[partial match]
[exact match]
[no match]
a / b < threshold, since b is higher
when
a = prob. of (
#2(w1 w2 w3 w4)

#1(w1 w2 w3) #1(w2 w3 w4)
#1(w1 w2) #1(w2 w3) #1(w3 w4)

#2(w1 w3) #2(w2 w4)

#3(w1 w4));
b = avg. prob. of all known exact matches;
where #n: any other (n - 1) words in-between.
Sentence:“w1 w2 w3 w4.”
27
Evaluate Pair: {Source,Target} Confidence
Confident
Trusted
Doubted
[Trusted Source]
[Confident Source]
[Doubted Source]
Triple: {Source,Target, Back}
Source Target
[Trusted Target]
[Not Doubted Target]
Evaluate Back Confidence
[Doubted Back]
28
What went wrong?
29
Summarization
• Extraction
• Classification
• Discriminative
• Abstraction
• Aggregation
• Generative
30
The name of the rose
Sounds depressing? Let’s try it anyway……
31
How about voting?
Consensus and prediction: non-linear programming
32
Sentiment Analysis
• Classification
• Polarity
• やばい
• Subjectivity
• In my
opinion……
• Emotion
33
Semantics?
• Classification vs.
• Ranking (as we’ve seen so far)
• Clustering
• Regression
• ……
34
Even Intractable
• Minimum Feedback Arc Set
• NP-complete, APX-hard
• Bipartite Tournament
• Hypergraph Grammar
• Synchronous Grammar
• Arrow’s Impossibility Theorem
• Social Choice
• Voting System
35
– disputed
“Prediction is very difficult, especially about the future.”
36
There are two kinds of…
PAIN. The sort of pain that
makes you strong, or useless
pain. The sort of pain that's only
suffering. I have no patience for
useless things.
37
What might make me
stronger……
(See also http://www.no-free-lunch.org)
38
Website Translation
250~ S&B sites / 3 months:
~50% are compatible, 2 have paid
39
Different Story
NY-based, IT capable

(See also https://dakwak.com)
40
HTML Side-effect
<span class=“notranslate”>Hello, WorldJumper!</span>
<!-- Are you talking to me? -->
41
I want more info
Less is more.
42
[[[坂⻄西優]]]です。
•
[[[坂⻄西優=Suguru Sakanishi]]]です。
• • • • • • •
43
More Anomalies
• 【⽶米】
• 飛来物
• 菜の花
• 桃⽩白⽩白
• ⽩白⽴立斌
• Oh, I also want [[[this part to
be a partially matched TM]]]
pre-edited for MT, please?
44
Read my lips
It’s not only about sound
45
Transliteration is not……
• Romanization
• Transcription
46
Transliteration
• Alignment
• Alignment
• Alignment
• (And better be more
than bilingual)
47
(1)
er of
n the
and
ence
also
s or
of
to-one-alignments possible. Furthermore,
combine to produce a single phoneme (d
single letter can sometimes produce tw
phonemes). For example, the English wo
Chinese transliteration “ ”, which
“phonemes”, is aligned as [15]:
A BE RT
| | |
<!-- ⽩白⽴立斌 -->
Hey! How about my privacy?
48
Overwriting
• <!-- John Doe #1 -->
49
Overwriting Side-effect
⼋八楽の⃝⃝と申します。
50
Slot Machine
Email Template?

Rule-based Machine Translation?
51
Multi-armed Bandit
Reinforcement Learning
52
Reinforcement
• Explore vs. Exploit
• Interactive
• Online
• Free Lunches
• Second moments and higher of
algorithms' generalisation error
• Coevolution
• Confidence intervals can give a
priori distinctions between
algorithms
• People respond to incentives
53
Translate X for Y
• {restaurant AD, coupon}
• {game, credit}
• {subtitle, DRM-free video}
• {Heart Sūtra, inner peace}
• {inside news, outside support}
• Taiwanese protesters
• {anything, incentives}
• See also: Unbabel, Duolingo
54
New Types of Assistance
for Translators
by Philipp Koehn
(http://www.mastar.jp/wfdtr/shiryou2013/Philipp%20Koehn.pdf

via http://www.mastar.jp/wfdtr/index-e.html)
55
Paraphrasing
Monolingual translation
56
Wrap up
• Where’s my pony semantics?
• Adaptation
• Chinese restaurant process
• Indian buffet process
• 信 (adequate)、達 (fluent)
• 雅 (elegant)?貼 (pertinent)?
• Bilingual might be insufficient: 全⽇日空 → ANA
• Pony: you can’t always get what you want
• Extrinsic evaluation
• Embrace and enjoy changes
57
<(_ _)>
(translate me)
58

Más contenido relacionado

Similar a Future work on adaptive computer-assisted translation (拋磚引玉; throwing brick to reveal jade) for TAUS Tokyo 2014

Your Testing is a Joke
Your Testing is a JokeYour Testing is a Joke
Your Testing is a Joke
James Thomas
 
Silverdisappointing8 120924091642-phpapp01
Silverdisappointing8 120924091642-phpapp01Silverdisappointing8 120924091642-phpapp01
Silverdisappointing8 120924091642-phpapp01
David Robles
 
Visual Tools and Innovation Games Full Day Workshop - SPTech-Con Austin - F...
Visual Tools and Innovation Games   Full Day Workshop - SPTech-Con Austin - F...Visual Tools and Innovation Games   Full Day Workshop - SPTech-Con Austin - F...
Visual Tools and Innovation Games Full Day Workshop - SPTech-Con Austin - F...
Michelle Caldwell, PSM, SSGB
 

Similar a Future work on adaptive computer-assisted translation (拋磚引玉; throwing brick to reveal jade) for TAUS Tokyo 2014 (19)

Stochastic Dynamic Programming and You.pptx
Stochastic Dynamic Programming and You.pptxStochastic Dynamic Programming and You.pptx
Stochastic Dynamic Programming and You.pptx
 
UX STRAT Europe 2019: Zhaochang He, VMware
UX STRAT Europe 2019: Zhaochang He, VMwareUX STRAT Europe 2019: Zhaochang He, VMware
UX STRAT Europe 2019: Zhaochang He, VMware
 
Your Testing is a Joke
Your Testing is a JokeYour Testing is a Joke
Your Testing is a Joke
 
NLP for Everyday People
NLP for Everyday PeopleNLP for Everyday People
NLP for Everyday People
 
ELUTE
ELUTEELUTE
ELUTE
 
Silverdisappointing8 120924091642-phpapp01
Silverdisappointing8 120924091642-phpapp01Silverdisappointing8 120924091642-phpapp01
Silverdisappointing8 120924091642-phpapp01
 
Disappointing results & open problems in Monte-Carlo Tree Search
Disappointing results & open problems in Monte-Carlo Tree SearchDisappointing results & open problems in Monte-Carlo Tree Search
Disappointing results & open problems in Monte-Carlo Tree Search
 
Refactoring RIA Unleashed 2011
Refactoring RIA Unleashed 2011Refactoring RIA Unleashed 2011
Refactoring RIA Unleashed 2011
 
Creating a constructive comment culture
Creating a constructive comment cultureCreating a constructive comment culture
Creating a constructive comment culture
 
Is your website losing you customers? Suffolk Chamber of Commerce talk by Cra...
Is your website losing you customers? Suffolk Chamber of Commerce talk by Cra...Is your website losing you customers? Suffolk Chamber of Commerce talk by Cra...
Is your website losing you customers? Suffolk Chamber of Commerce talk by Cra...
 
How To Run a 5 Whys (With Humans, Not Robots)
How To Run a 5 Whys (With Humans, Not Robots)How To Run a 5 Whys (With Humans, Not Robots)
How To Run a 5 Whys (With Humans, Not Robots)
 
How To Be A Real Developer In Two Easy Steps
How To Be A Real Developer In Two Easy StepsHow To Be A Real Developer In Two Easy Steps
How To Be A Real Developer In Two Easy Steps
 
Corp Web Risks and Concerns
Corp Web Risks and ConcernsCorp Web Risks and Concerns
Corp Web Risks and Concerns
 
Visual Tools and Innovation Games Full Day Workshop - SPTech-Con Austin - F...
Visual Tools and Innovation Games   Full Day Workshop - SPTech-Con Austin - F...Visual Tools and Innovation Games   Full Day Workshop - SPTech-Con Austin - F...
Visual Tools and Innovation Games Full Day Workshop - SPTech-Con Austin - F...
 
On Hiring (in a Devops World)
On Hiring (in a Devops World)On Hiring (in a Devops World)
On Hiring (in a Devops World)
 
Visual tools and innovation games - full day workshop - sp intersections - no...
Visual tools and innovation games - full day workshop - sp intersections - no...Visual tools and innovation games - full day workshop - sp intersections - no...
Visual tools and innovation games - full day workshop - sp intersections - no...
 
Introduction to Voice Design
Introduction to Voice DesignIntroduction to Voice Design
Introduction to Voice Design
 
My Parents Configured Their Living Room
My Parents Configured Their Living RoomMy Parents Configured Their Living Room
My Parents Configured Their Living Room
 
Yoav Goldberg: Word Embeddings What, How and Whither
Yoav Goldberg: Word Embeddings What, How and WhitherYoav Goldberg: Word Embeddings What, How and Whither
Yoav Goldberg: Word Embeddings What, How and Whither
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Future work on adaptive computer-assisted translation (拋磚引玉; throwing brick to reveal jade) for TAUS Tokyo 2014

Notas del editor

  1. Discount Rule overlap Bad single-level rewrites Node count Insertion Soft syntactic constraints Structural distortion Word context
  2. Well, thanks (?) to Chomsky