SlideShare una empresa de Scribd logo
1 de 135
Descargar para leer sin conexión
Computer Aided Translation
                          Philipp Koehn

                           10 June 2010




Philipp Koehn          Computer Aided Translation   10 June 2010
1
                           Overview


• Volunteer Translation Projects

• Machine Translation

• Human Translation

• Assistance to Human Translators

• User Study 1

• User Study 2

Philipp Koehn           Computer Aided Translation   10 June 2010
2
                           Overview


• Volunteer Translation Projects

• Machine Translation

• Human Translation

• Assistance to Human Translators

• User Study 1

• User Study 2

Philipp Koehn           Computer Aided Translation   10 June 2010
3
               Crowd Sourcing vs. Volunteers
• Successful volunteer collaboration translation projects
  (initiated by a growing communities of self-interested participants)
  – DE-News
  – Chinese translations of Guardian etc.
  – dotSUB
3
                Crowd Sourcing vs. Volunteers
• Successful volunteer collaboration translation projects
  (initiated by a growing communities of self-interested participants)
   – DE-News
   – Chinese translations of Guardian etc.
   – dotSUB

• Successful crowd sourcing translation projects
  (initiated by an organization with a translation need)
   – Google localization
   – TED translations




Philipp Koehn                 Computer Aided Translation                 10 June 2010
4
                                  DE-News
• Project
   – transcription of German radio headline news
   – translation into English
   – about 5-10 stories per day, 1993-2003, http://www.germnews.de/

• Motivation
   –   initially Germans abroad wanted to stay informed about events in Germany
   –   also non-German speakers who were interested in Germany
   –   no lack of translators (mostly Germans), but of news gatherers
   –   mostly altruistic: interested in practicing language skills?

• used for statistical machine translation:
  1 million word parallel corpus collected in 2002

Philipp Koehn                  Computer Aided Translation             10 June 2010
5
                Chinese Translations of Guardian
• Project
   –   largest open translation community in China, launched in 2006
   –   90,000 contributors, 5,000 ”community translators”, 30,000 translations
   –   motivation: make English content available to Chinese readers
   –   http://www.yeeyan.org/

• Guardian translation project
   – official collaboration with British Guardian news paper
   – Dec 2009: translation of Guardian articles ”closed down by the Chinese
     authorities”




Philipp Koehn                  Computer Aided Translation               10 June 2010
6
                                     dotSUB
• Project
   –   subtitling and translation platform, launched in 2007
   –   ”upload your video, add sub titles, translate subtitles”
   –   easy user interface, open to anybody
   –   service used by TED talks for their translations

• Content
   – guides to Wikis, RSS, Twitter, ...
   – documentations
   – political opinion pieces




Philipp Koehn                    Computer Aided Translation       10 June 2010
7
                           Overview


• Volunteer Translation Projects

• Machine Translation

• Human Translation

• Assistance to Human Translators

• User Study 1

• User Study 2

Philipp Koehn           Computer Aided Translation   10 June 2010
8
                Statistical Machine Translation
• Learning from data (sentence-aligned translated texts)

                   German documents                      English documents

                    ....Bank..................            ....bank..................


                    ..........Bank............            ..........bench..........


                    ........Bank..............            ........bank..............


                    .................Bank.....            .................bank.....

                             p(bank|Bank) = 0.75, p(bench|Bank) = 0.25



• New machine translation systems can be built automatically

Philipp Koehn                       Computer Aided Translation                         10 June 2010
9
                    Phrase-Based Translation




• Foreign input is segmented in phrases
   – any sequence of words, not necessarily linguistically motivated

• Each phrase is translated into English

• Phrases are reordered

Philipp Koehn                 Computer Aided Translation               10 June 2010
10
                                      Translation Options
           er                  geht                   ja                   nicht                  nach            hause
           he                    is                   yes                     not                after             house
            it                  are                    is                   do not                 to              home
           , it                goes              , of course               does not           according to        chamber
          , he                  go                      ,                   is not                 in             at home
                       it is                                       not                                      home
                  he will be                                     is not                                 under house
                    it goes                                     does not                                return home
                   he goes                                       do not                                    do not
                                           is                                             to
                                          are                                         following
                                      is after all                                    not after
                                         does                                           not to
                                                        not
                                                      is not
                                                     are not
                                                     is not a




• Many translation options to choose from


Philipp Koehn                                    Computer Aided Translation                                           10 June 2010
11
                                      Translation Options
           er                  geht                   ja                   nicht                  nach            hause
           he                    is                   yes                     not                after             house
            it                  are                    is                   do not                 to              home
           , it                goes              , of course               does not           according to        chamber
          , he                  go                                          is not                 in             at home
                       it is                                       not                                      home
                  he will be                                     is not                                 under house
                    it goes                                     does not                                return home
                   he goes                                       do not                                    do not
                                           is                                             to
                                          are                                         following
                                      is after all                                    not after
                                         does                                           not to
                                                        not
                                                      is not
                                                     are not
                                                     is not a




• Many translation options to choose from


Philipp Koehn                                    Computer Aided Translation                                           10 June 2010
12
                Decoding Process: Find Best Path
                er   geht             ja              nicht          nach     hause




                                             yes

                        he
                                            goes              home

                        are
                                           does not            go           home

                            it
                                                               to




Philipp Koehn                    Computer Aided Translation                           10 June 2010
13
                  Why Machine Translation?
Assimilation — reader initiates translation, wants to know content
  • user is tolerant of inferior quality
  • focus of majority of research (GALE program, etc.)
13
                  Why Machine Translation?
Assimilation — reader initiates translation, wants to know content
  • user is tolerant of inferior quality
  • focus of majority of research (GALE program, etc.)

Communication — participants don’t speak same language, rely on translation
  • users can ask questions, when something is unclear
  • chat room translations, hand-held devices
  • often combined with speech recognition, IWSLT campaign
13
                   Why Machine Translation?
Assimilation — reader initiates translation, wants to know content
    • user is tolerant of inferior quality
    • focus of majority of research (GALE program, etc.)

Communication — participants don’t speak same language, rely on translation
    • users can ask questions, when something is unclear
    • chat room translations, hand-held devices
    • often combined with speech recognition, IWSLT campaign

Dissemination — publisher wants to make content available in other languages
    • high demands for quality
    • currently almost exclusively done by human translators

Philipp Koehn                 Computer Aided Translation             10 June 2010
14
                   Why Machine Translation?
Assimilation — reader initiates translation, wants to know content
    • user is tolerant of inferior quality
    • focus of majority of research (GALE program, etc.)

Communication — participants don’t speak same language, rely on translation
    • users can ask questions, when something is unclear
    • chat room translations, hand-held devices
    • often combined with speech recognition, IWSLT campaign

Dissemination — publisher wants to make content available in other languages
    • high demands for quality                                  OUR
    • currently almost exclusively done by human translators   FOCUS

Philipp Koehn                 Computer Aided Translation             10 June 2010
15
Goal: Helping Human Translators



If you can’t beat them, join them.
15
          Goal: Helping Human Translators



           If you can’t beat them, join them.



• How can machine translation help human translators?
15
                Goal: Helping Human Translators



                If you can’t beat them, join them.



• How can machine translation help human translators?

• First question: What do translators do?


Philipp Koehn             Computer Aided Translation    10 June 2010
16
                           Overview


• Volunteer Translation Projects

• Machine Translation

• Human Translation

• Assistance to Human Translators

• User Study 1

• User Study 2

Philipp Koehn           Computer Aided Translation   10 June 2010
17
                                  Setup
• 10 students at the University of Edinburgh
  – half native French speakers
  – half native English speakers with advanced French
17
                                      Setup
• 10 students at the University of Edinburgh
   – half native French speakers
   – half native English speakers with advanced French

• Each student translated
   –   news stories
   –   French-English
   –   about 40 sentences
   –   easy task: familiar content, no specialized terminology

• Keystroke log



Philipp Koehn                   Computer Aided Translation       10 June 2010
18
                               Keystroke Log
Input: Au premier semestre, l’avionneur a livr 97 avions.
Output: The manufacturer has delivered 97 planes during the first half.




                              (37.5 sec, 3.4 sec/word)

                black: keystroke, purple: deletion, grey: cursor move
                             height: length of sentence


Philipp Koehn                   Computer Aided Translation              10 June 2010
19
                   Analysis
• We can observe
  – slow typing
19
                   Analysis
• We can observe
  – slow typing
  – fast typing
19
                   Analysis
• We can observe
  – slow typing
  – fast typing
  – pauses
19
                                Analysis
• We can observe
  – slow typing
  – fast typing
  – pauses

• Pauses
  – beginning pause: reading the input sentence
  – final pause: reviewing the translation
19
                                  Analysis
• We can observe
   – slow typing
   – fast typing
   – pauses

• Pauses
   –   beginning pause: reading the input sentence
   –   final pause: reviewing the translation
   –   short pauses (2-6 seconds): hesitation
   –   medium pauses (6-60 seconds): problem solving
   –   big pauses (>60 seconds): serious problem



Philipp Koehn                 Computer Aided Translation   10 June 2010
20
                         Time Spent on Activities
                                              Pauses
          User   total    initial   final      short medium       big    keystroke
          L1a    3.3s      0.1s     0.1s       0.2s  1.0s        0.1s     1.8s
          L1b    7.7s      1.3s     0.1s       0.3s  1.8s        1.9s     2.3s
          L1c    3.9s      0.2s     0.2s       0.3s  0.7s          -      2.5s
          L1d    2.8s      0.2s     0.0s       0.2s  0.4s        0.1s     1.8s
          L1e    5.2s      0.3s     0.0s       0.3s  1.9s        0.5s     2.2s
          L2a    5.7s      0.5s     0.1s       0.3s  1.8s        0.7s     2.2s
          L2b    3.2s      0.1s     0.1s       0.2s  0.4s        0.1s     2.2s
          L2c    5.8s      0.3s     0.2s       0.5s  1.5s        0.3s     3.1s
          L2d    3.4s      0.7s     0.1s       0.3s  0.6s          -      1.8s
          L2e    2.8s      0.3s     0.2s       0.2s  0.3s        0.1s     1.9s
                     L1 = native French, L2 = native English
                          average time per input word

Philipp Koehn                       Computer Aided Translation                 10 June 2010
21
                         Time Spent on Activities
                         not much time Pauses
          User   total    initial final short medium         big    keystroke
          L1a    3.3s      0.1s   0.1s 0.2s   1.0s          0.1s     1.8s
          L1b    7.7s      1.3s   0.1s 0.3s   1.8s          1.9s     2.3s
          L1c    3.9s      0.2s   0.2s 0.3s   0.7s            -      2.5s
          L1d    2.8s      0.2s   0.0s 0.2s   0.4s          0.1s     1.8s
          L1e    5.2s      0.3s   0.0s 0.3s   1.9s          0.5s     2.2s
          L2a    5.7s      0.5s   0.1s 0.3s   1.8s          0.7s     2.2s
          L2b    3.2s      0.1s   0.1s 0.2s   0.4s          0.1s     2.2s
          L2c    5.8s      0.3s   0.2s 0.5s   1.5s          0.3s     3.1s
          L2d    3.4s      0.7s   0.1s 0.3s   0.6s            -      1.8s
          L2e    2.8s      0.3s   0.2s 0.2s   0.3s          0.1s     1.9s
                     L1 = native French, L2 = native English
                          average time per input word

Philipp Koehn                  Computer Aided Translation                 10 June 2010
22
                         Time Spent on Activities
                         not much time Pauses                      similar
          User   total    initial final short medium         big    keystroke
          L1a    3.3s      0.1s   0.1s 0.2s   1.0s          0.1s     1.8s
          L1b    7.7s      1.3s   0.1s 0.3s   1.8s          1.9s     2.3s
          L1c    3.9s      0.2s   0.2s 0.3s   0.7s            -      2.5s
          L1d    2.8s      0.2s   0.0s 0.2s   0.4s          0.1s     1.8s
          L1e    5.2s      0.3s   0.0s 0.3s   1.9s          0.5s     2.2s
          L2a    5.7s      0.5s   0.1s 0.3s   1.8s          0.7s     2.2s
          L2b    3.2s      0.1s   0.1s 0.2s   0.4s          0.1s     2.2s
          L2c    5.8s      0.3s   0.2s 0.5s   1.5s          0.3s     3.1s
          L2d    3.4s      0.7s   0.1s 0.3s   0.6s            -      1.8s
          L2e    2.8s      0.3s   0.2s 0.2s   0.3s          0.1s     1.9s
                     L1 = native French, L2 = native English
                          average time per input word

Philipp Koehn                  Computer Aided Translation                    10 June 2010
23
                         Time Spent on Activities
                         not much time Pauses differences       similar
          User   total    initial final short medium big        keystroke
          L1a    3.3s      0.1s   0.1s 0.2s   1.0s     0.1s      1.8s
          L1b    7.7s      1.3s   0.1s 0.3s   1.8s     1.9s      2.3s
          L1c    3.9s      0.2s   0.2s 0.3s   0.7s       -       2.5s
          L1d    2.8s      0.2s   0.0s 0.2s   0.4s     0.1s      1.8s
          L1e    5.2s      0.3s   0.0s 0.3s   1.9s     0.5s      2.2s
          L2a    5.7s      0.5s   0.1s 0.3s   1.8s     0.7s      2.2s
          L2b    3.2s      0.1s   0.1s 0.2s   0.4s     0.1s      2.2s
          L2c    5.8s      0.3s   0.2s 0.5s   1.5s     0.3s      3.1s
          L2d    3.4s      0.7s   0.1s 0.3s   0.6s       -       1.8s
          L2e    2.8s      0.3s   0.2s 0.2s   0.3s     0.1s      1.9s
                     L1 = native French, L2 = native English
                          average time per input word

Philipp Koehn                   Computer Aided Translation               10 June 2010
24
                           Overview


• Volunteer Translation Projects

• Machine Translation

• Human Translation

• Assistance to Human Translators

• User Study 1

• User Study 2

Philipp Koehn           Computer Aided Translation   10 June 2010
Related Work: Tools used by Translators25
• Translators often use standard text editors and additional tools

• Bilingual dictionary

• Spell checker, grammar checker

• Monolingual concordancer

• Terminology database

• Web search to establish and verify meaning of terms




Philipp Koehn                 Computer Aided Translation             10 June 2010
26
                             Translation Memory
• Source:

                     This feature is available for free in the QX 3400.

• Fuzzy match in translation memory:

                     This feature is available for free in the QX 3200.
                Diese Funktion ist kostenlos im Modell QX 3200 verf¨gbar.
                                                                      u

• Translator inspects the fuzzy match and uses it in her translation.




Philipp Koehn                     Computer Aided Translation              10 June 2010
27
                     Bilingual Concordancer




                show translations in context (www.linguee.com)

Philipp Koehn                Computer Aided Translation          10 June 2010
28
                   Our Types of Assistance
• Sentence completion
  – tool suggests how to complete the translation
  – one phrase at a time
28
                    Our Types of Assistance
• Sentence completion
  – tool suggests how to complete the translation
  – one phrase at a time

• Translation options
  – most likely translations for each word and phrase
  – ordered and color-highlighted by probability
28
                     Our Types of Assistance
• Sentence completion
   – tool suggests how to complete the translation
   – one phrase at a time

• Translation options
   – most likely translations for each word and phrase
   – ordered and color-highlighted by probability

• Postediting machine translation
   – start with machine translation output
   – user edits, tool shows changes


Philipp Koehn                 Computer Aided Translation   10 June 2010
29
                           Technical Notes
• Online at http://www.caitra.org/

• User uploads source text, translates one sentence at a time

• Implementation
   – AJAX Web 2.0 using Ruby on Rails, mySQL
   – Back end: Moses machine translation system




Philipp Koehn                Computer Aided Translation         10 June 2010
30
             Predicting Sentence Completion




• Tool makes a suggestion how to continue (in red)
30
              Predicting Sentence Completion




• Tool makes a suggestion how to continue (in red)

• User can accept it (by pressing tab), or type in her own translation
30
                Predicting Sentence Completion




• Tool makes a suggestion how to continue (in red)

• User can accept it (by pressing tab), or type in her own translation

• Same idea as TransType, with minor modifications
   – show only short text chunks, not full sentence completion
   – show only one suggestion, not alternatives


Philipp Koehn                 Computer Aided Translation                 10 June 2010
31
                      How does it work?
• Uses search graph of SMT decoding
31
                         How does it work?
• Uses search graph of SMT decoding

• Matches partial user translation against search graph, by optimizing
 1. minimal string edit distance between path in graph and user translation
 2. best full path probability, including best completion to end
31
                          How does it work?
• Uses search graph of SMT decoding

• Matches partial user translation against search graph, by optimizing
   1. minimal string edit distance between path in graph and user translation
   2. best full path probability, including best completion to end

• Technical notes
   – search graph is pre-computed and stored in database
   – matching is done server-side, typically takes less than 1 second
   – completion path is returned to client (web brower)




Philipp Koehn                 Computer Aided Translation                 10 June 2010
32
                        Translation Options




• For each word and phrases: suggested translations

• Ranked (and color-highlighted) by probability

• User may click on suggestion → appended to text box

Philipp Koehn                Computer Aided Translation   10 June 2010
Translation Options - How does it work?33
• Uses phrase translation table of SMT system
Translation Options - How does it work?33
• Uses phrase translation table of SMT system

• Translation score: future cost estimate
  –                                 e ¯       ¯e
      conditional probabilities φ(¯|f ), φ(f |¯)
  –                             e ¯        ¯e
      lexical probabilities lex(¯|f ), lex(f |¯)
  –   word count feature
  –   language model estimate
Translation Options - How does it work?33
• Uses phrase translation table of SMT system

• Translation score: future cost estimate
   –                                 e ¯       ¯e
       conditional probabilities φ(¯|f ), φ(f |¯)
   –                             e ¯        ¯e
       lexical probabilities lex(¯|f ), lex(f |¯)
   –   word count feature
   –   language model estimate

• Ranking of shorter vs. longer phrases by including outside future cost estimate




Philipp Koehn                     Computer Aided Translation           10 June 2010
35
                Postediting Machine Translation




• Textbox is initially filled with machine translation

• User edits translation

• String edit distance to machine translation is shown (blue background)


Philipp Koehn                 Computer Aided Translation             10 June 2010
36
                           Overview


• Volunteer Translation Projects

• Machine Translation

• Human Translation

• Assistance to Human Translators

• User Study 1

• User Study 2

Philipp Koehn           Computer Aided Translation   10 June 2010
37
                               Evaluation
• Recall setup
  – 10 students, half native French, half native English
  – each student translated French-English news stories
  – about 40 sentences for each condition of assistance
37
                                 Evaluation
• Recall setup
   – 10 students, half native French, half native English
   – each student translated French-English news stories
   – about 40 sentences for each condition of assistance

• Five different conditions
   –   unassisted
   –   prediction (sentence completion)
   –   options
   –   predictions and options
   –   post-editing



Philipp Koehn                  Computer Aided Translation   10 June 2010
38
                                  Quality
• We want faster translators, but not worse

• Assessment of translation quality
  – show translations to bilingual judges, with source
  – judgment: fully correct? yes/no
      Indicate whether each user’s input represents a fully fluent and
      meaning-equivalent translation of the source. The source is shown
      with context, the actual sentence is bold.
38
                                   Quality
• We want faster translators, but not worse

• Assessment of translation quality
   – show translations to bilingual judges, with source
   – judgment: fully correct? yes/no
         Indicate whether each user’s input represents a fully fluent and
         meaning-equivalent translation of the source. The source is shown
         with context, the actual sentence is bold.

• Average score: 50% correct — lower than expected
   – judges seemed to be too harsh
   – when given several translations, tendency to judge half as bad

Philipp Koehn                 Computer Aided Translation              10 June 2010
39
                   Example of Quality Judgments

 Src.   Sans se d´monter, il s’est montr´ concis et pr´cis.
                 e                      e             e
 MT     Without dismantle, it has been concise and accurate.
 1/3    Without fail, he has been concise and accurate.              (Prediction+Options, L2a)
 4/0    Without getting flustered, he showed himself to be concise and precise. (Unassisted, L2b)
 4/0    Without falling apart, he has shown himself to be concise and accurate. (Postedit, L2c)
 1/3    Unswayable, he has shown himself to be concise and to the point.          (Options, L2d)
 0/4    Without showing off, he showed himself to be concise and precise.        (Prediction, L2e)
 1/3    Without dismantling himself, he presented himself consistent and precise.
                                                                     (Prediction+Options, L1a)
 2/2    He showed himself concise and precise.                                 (Unassisted, L1b)
 3/1    Nothing daunted, he has been concise and accurate.                        (Postedit, L1c)
 3/1    Without losing face, he remained focused and specific.                     (Options, L1d)
 3/1    Without becoming flustered, he showed himself concise and precise. (Prediction, L1e)




Philipp Koehn                       Computer Aided Translation                      10 June 2010
40
                         Faster and Better

                Assistance             Speed              Quality
                Unassisted             4.4s/word          47% correct
                Postedit               2.7s (-1.7s)       55% (+8%)
                Options                3.7s (-0.7s)       51% (+4%)
                Prediction             3.2s (-1.2s)       54% (+7%)
                Prediction+Options     3.3s (-1.1s)       53% (+6%)




Philipp Koehn                Computer Aided Translation                 10 June 2010
41
                              Faster and Better, Mostly
       User     Unassisted         Postedit           Options           Prediction      Prediction+Options
       L1a      3.3sec/word     1.2s    -2.2s      2.3s    -1.0s      1.1s    -2.2s       2.4s    -0.9s
                23% correct    39%      +16%)     45%      +22%      30%      +7%)       44%      +21%
       L1b      7.7sec/word     4.5s    -3.2s)     4.5s    -3.3s      2.7s    -5.1s       4.8s    -3.0s
                35% correct    48%      +13%      55%      +20%      61%      +26%       41%      +6%
       L1c      3.9sec/word     1.9s    -2.0s      3.8s    -0.1s      3.1s    -0.8s       2.5s    -1.4s
                50% correct    61%      +11%      54%      +4%       64%      +14%       61%      +11%
       L1d      2.8sec/word     2.0s    -0.7s      2.9s    (+0.1s)    2.4s    (-0.4s)     1.8s    -1.0s
                38% correct    46%      +8%        59%     (+21%)     37%     (-1%)      45%      +7%
       L1e      5.2sec/word     3.9s    -1.3s      4.9s    (-0.2s)    3.5s    -1.7s       4.6s    (-0.5s)
                58% correct    64%      +6%        56%     (-2%)     62%      +4%         56%     (-2%)
       L2a      5.7sec/word     1.8s    -3.9s      2.5s    -3.2s      2.7s    -3.0s       2.8s    -2.9s
                16% correct    50%      +34%      34%      +18%      40%      +24%       50%      +34%
       L2b      3.2sec/word     2.8s    (-0.4s)    3.5s    +0.3s      6.0s    +2.8s       4.6s    +1.4s
                64% correct     56%     (-8%)      60%     -4%        61%     -3%         57%     -7%
       L2c      5.8sec/word     2.9s    -3.0s      4.6s    (-1.2s)    4.1s    -1.7s       2.7s    -3.1s
                52% correct    53%      +1%        37%     (-15%)    59%      +7%        53%      +1%
       L2d      3.4sec/word     3.1s    (-0.3s)    4.3s    (+0.9s)    3.8s    (+0.4s)     3.7s    (+0.3s)
                49% correct     49%     (+0%)      51%     (+2%)      53%     (+4%)       58%     (+9%)
       L2e      2.8sec/word     2.6s    -0.2s      3.5s    +0.7s      2.8s    (-0.0s)     3.0s    +0.2s
                68% correct    79%      +11%       59%     -9%        64%     (-4%)       66%     -2%
       avg.     4.4sec/word     2.7s    -1.7s      3.7s    -0.7s      3.2s    -1.2s       3.3s    -1.1s
                47% correct    55%      +8%       51%      +4%       54%      +7%        53%      +6%



Philipp Koehn                               Computer Aided Translation                              10 June 2010
42
                Slow Users 1: Faster and Better
8s
                2b


7s                                              • Unassisted

6s                                                 – more than 5 seconds per input word
       1a                                          – very bad (35%, 16%)
5s
                     +
                         E       O
                                                • With assistance
4s
                                                   – much faster and better
3s
                     P       +       P
                                                   – reaching roughly average performance
                O
2s
                             E


1s
     10% 20% 30% 40% 50% 60%



Philipp Koehn                            Computer Aided Translation               10 June 2010
43
                            Slow Users 2: Only Faster
                8s

                7s                                           • Unassisted
                6s                                              – more than 5 seconds per input word
                              1c
                                                                – average quality
                                     2e
                5s                 O
                        O          +

                4s                    P
                                                             • With assistance
                                              E
                                          P
                                                                – faster and but not better
                3s             E
                               +

                2s

                1s
                     30% 40% 50% 60%




Philipp Koehn                                     Computer Aided Translation                  10 June 2010
44
                                   Fast Users
                  4s                              2c O

                            2a
                  3s                                         P

                                                         +
                  2s                                     E

                                             +O
                                         E
                  1s               P

                       10% 20% 30% 40% 50% 60% 70% 80%


• Unassisted
  – fast: 3-4 seconds per input word
  – L1a is very bad (23%), L1c is average (50%)
• With assistance
  – faster and better
  – L1a closer to average (30-45%), L1c becomes very good (54-61%)

Philipp Koehn                    Computer Aided Translation      10 June 2010
45
                                Refuseniks
                    4s
                                              1d
                                                       1b
                    3s                        E
                                     2d            E        1e
                                                                 E

                    2s                    E


                    1s
                      10% 20% 30% 40% 50% 60% 70% 80%


• Use the assistance sparingly or not at all, and see generally no gains
• The two best translators are in this group
• Postediting
  – mixed on quality (2 better, 1 worse, 1 same), but all faster
  – best translator (L2e, 68%) becomes much better (record 79%)


Philipp Koehn                 Computer Aided Translation                   10 June 2010
46
                          Learning Curve
                users become better over time with assistance




Philipp Koehn               Computer Aided Translation          10 June 2010
47
                             User Feedback
• Q: In which of the five conditions did you think you were most accurate?
  –   predictions+options: 5 users
  –   options: 2 users
  –   prediction: 1 user
  –   postediting: 1 user
47
                             User Feedback
• Q: In which of the five conditions did you think you were most accurate?
  –   predictions+options: 5 users
  –   options: 2 users
  –   prediction: 1 user
  –   postediting: 1 user

• Q: Rank the different types of assistance on a scale from 1 to 5, where1
  indicates not at all and 5 indicates very helpful.
  –   prediction+options: 4.6
  –   prediction: 3.9
  –   options: 3.7
  –   postediting: 2.9
47
                              User Feedback
• Q: In which of the five conditions did you think you were most accurate?
   –   predictions+options: 5 users
   –   options: 2 users
   –   prediction: 1 user
   –   postediting: 1 user

• Q: Rank the different types of assistance on a scale from 1 to 5, where1
  indicates not at all and 5 indicates very helpful.
   –   prediction+options: 4.6
   –   prediction: 3.9
   –   options: 3.7
   –   postediting: 2.9


Philipp Koehn                    Computer Aided Translation      10 June 2010
User Feedback                       48


• Q: In which of the five conditions did you think you were most accurate?
   –   predictions+options: 5 users
   –   options: 2 users
   –   prediction: 1 user
   –   postediting: 1 user

• Q: Rank the different types of assistance on a scale from 1 to 5, where1
  indicates not at all and 5 indicates very helpful.
   –   prediction+options: 4.6
   –   prediction: 3.9
   –   options: 3.7
   –   postediting: 2.9

• Note: does not match empirical results

Philipp Koehn                    Computer Aided Translation      10 June 2010
49
                                Summary
• Assistance made translators faster
  – average speed improvement from 4.4s/word to 2.7-3.7s/word
  – reduction of big pauses
  – reduction of typing effort in post-editing
49
                                Summary
• Assistance made translators faster
  – average speed improvement from 4.4s/word to 2.7-3.7s/word
  – reduction of big pauses
  – reduction of typing effort in post-editing

• Assistance made translators better
  – average judgment increased from 47% to 51-55% with help
  – even good translators get better with postediting
49
                                Summary
• Assistance made translators faster
  – average speed improvement from 4.4s/word to 2.7-3.7s/word
  – reduction of big pauses
  – reduction of typing effort in post-editing

• Assistance made translators better
  – average judgment increased from 47% to 51-55% with help
  – even good translators get better with postediting

• Some good translators ignored the assistance
49
                                 Summary
• Assistance made translators faster
   – average speed improvement from 4.4s/word to 2.7-3.7s/word
   – reduction of big pauses
   – reduction of typing effort in post-editing

• Assistance made translators better
   – average judgment increased from 47% to 51-55% with help
   – even good translators get better with postediting

• Some good translators ignored the assistance

• Fastest and (barely) best with postediting, but did not like it


Philipp Koehn                 Computer Aided Translation            10 June 2010
50
                           Overview


• Volunteer Translation Projects

• Machine Translation

• Human Translation

• Assistance to Human Translators

• User Study 1

• User Study 2

Philipp Koehn           Computer Aided Translation   10 June 2010
51
                               Experiment
• Monolingual translators
  – 10 students/staff at the University of Edinburgh
  – none knew Arabic or Chinese
  – have access to full stories at a time, may correct prior sentences
51
                                Experiment
• Monolingual translators
   – 10 students/staff at the University of Edinburgh
   – none knew Arabic or Chinese
   – have access to full stories at a time, may correct prior sentences

• Bilingual translators
   – 3 of the 4 reference translations in NIST test set

• Remaining reference translation as truth




Philipp Koehn                  Computer Aided Translation                 10 June 2010
52
                                         Stories
 Story      Headline                                                         Sent.      Words
 1: chi     White House Pushes for Nuclear Inspectors to Be Sent as Soon       6         207
            as Possible to Monitor North Korea’s Closure of Its Nuclear
            Reactors
 2: chi     Torrential Rains Hit Western India, 43 People Dead                10           204
 3: chi     Research Shows a Link between Arrhythmia and Two Forms             7           247
            of Genetic Variation
 4: chi     Veteran US Goalkeeper Keller May Retire after America’s Cup       10           367
 5: ara     Britain: Arrests in Several Cities and Explosion of Suspicious     7           224
            Car
 6: ara     Ban Ki-Moon Withdraws His Report on the Sahara after              8            310
            Controversy Surrounding Its Content
 7: ara     Pakistani Opposition Leaders Call on Musharraf to Resign.         11           312
 8: ara     Al-Maliki: Iraqi Forces Are Capable of Taking Over the             8           255
            Security Dossier Any Time They Want


Philipp Koehn                       Computer Aided Translation                       10 June 2010
53
                               Results: Arabic
                60


                50


                40


                30


                20


                10


                0
                              mono1   mono2   mono3   mono4   mono5   mono6   mono7   mono8   mono9 mono10


                     percentage of sentences judged as correct


Philipp Koehn                     Computer Aided Translation                                                 10 June 2010
54
                                         Results: Arabic
                80
                                                                                                        Arabic
                70

                60

                50

                40

                30

                20

                10

                0
                     bi1   bi2    bi3   mono1   mono2   mono3   mono4   mono5   mono6   mono7   mono8   mono9 mono10


                                 compared to bilingual translators


Philipp Koehn                               Computer Aided Translation                                                 10 June 2010
55
                                        Results: Arabic
                80

                70

                60

                50

                40

                30

                20

                10

                0
                     bi1   bi2   bi3   mono1   mono2   mono3   mono4   mono5   mono6   mono7   mono8   mono9 mono10


                     best monolinguals as good as worst bilingual


Philipp Koehn                              Computer Aided Translation                                                 10 June 2010
56
                           Results: Arabic and Chinese
                80
                                                                                                        Arabic
                                                                                                        Chinese
                70

                60

                50

                40

                30

                20

                10

                0
                     bi1    bi2   bi3   mono1   mono2   mono3   mono4   mono5   mono6   mono7   mono8   mono9 mono10


                           mostly worse performance for Chinese


Philipp Koehn                               Computer Aided Translation                                                 10 June 2010
57
                                               Results per Story
                80                                                                     Bilingual
                                                                                       Mono Post-Edit
                70

                60

                50

                40

                30

                20

                10

                 0
                                   Chinese Weather            Chinese Sports              Arabic Diplomacy               Arabic Politics
                     Chinese Politics           Chinese Science              Arabic Terror               Arabic Politics

                                  performance differs widely per story

Philipp Koehn                                          Computer Aided Translation                                                          10 June 2010
58
                                               Results per Story
                80                                                                     Bilingual
                                                                                       Mono Post-Edit
                70

                60

                50

                40

                30

                20

                10

                 0
                                   Chinese Weather            Chinese Sports              Arabic Diplomacy               Arabic Politics
                     Chinese Politics           Chinese Science              Arabic Terror               Arabic Politics

                        one story: monolinguals as good as bilinguals

Philipp Koehn                                          Computer Aided Translation                                                          10 June 2010
59
                    Offering more assistance
• Progress in computer aided translation
59
                    Offering more assistance
• Progress in computer aided translation

• Interactive machine translation (TransType)
  – show prediction of sentence completion
  – recompute when user types own translation
59
                     Offering more assistance
• Progress in computer aided translation

• Interactive machine translation (TransType)
   – show prediction of sentence completion
   – recompute when user types own translation

• Alternative translations [Koehn and Haddow, 2009]
   – display translation options from translation model
   – ranked by translation score




Philipp Koehn                 Computer Aided Translation   10 June 2010
60
                        Translation Options




                up to 10 translations for each word / phrase



Philipp Koehn                Computer Aided Translation        10 June 2010
61
                Translation Options




Philipp Koehn      Computer Aided Translation   10 June 2010
62
                                         Results with Options
                80                                                                     Bilingual
                                                                                       Mono Post-Edit
                70                                                                     Mono Options

                60

                50

                40

                30

                20

                10

                 0
                                   Chinese Weather            Chinese Sports              Arabic Diplomacy               Arabic Politics
                     Chinese Politics           Chinese Science              Arabic Terror               Arabic Politics

                        no big difference — once significantly better

Philipp Koehn                                          Computer Aided Translation                                                          10 June 2010
63
                           Error Analysis
                         (a) Critical Judges

• Reference
       Torrential Rains Hit Western India, 43 People Dead

• Bilingual translator
       Heavy Rains Plague Western India Leaving 43 Dead




Philipp Koehn                 Computer Aided Translation    10 June 2010
64
                      Error Analysis
        (b) Mistakes by the professional translators

• Reference
       Over just two days on the 29th and 30th, rainfall in Mumbai reached
       243 mm.

• Bilingual translator
       The rainfall in Mumbai had reached 243 cm over the two days of the
       29th and 30th alone.




Philipp Koehn                 Computer Aided Translation           10 June 2010
65
                     Error Analysis
        (b2) Domain knowledge vs. language skills

• Bilingual translator
       With Munchen-Gladbach falling to the German Bundesliga 2, ...

• Monolingual translator
       The M¨nchengladbach team fell into the second German league, ...
            o




Philipp Koehn                Computer Aided Translation            10 June 2010
66
                      Error Analysis
        (c) Bad English by monolingual translators



• Monolingual translator
       The western region of india heavy rain killed 43 people.




Philipp Koehn                  Computer Aided Translation         10 June 2010
67
                        Error Analysis
            (d) Mistranslated / untranslated name
• Reference
       Johndroe said that the two leaders ...

• Machine translation
       Strong zhuo, pointing out that the two presidents ...

• Monolingual translator
       Qiang Zhuo pointed out that the two presidents ...



Philipp Koehn                  Computer Aided Translation      10 June 2010
68
                       Error Analysis
           (e) Wrong relationship between entities
• Machine translation
       The colombian team for the match, and it is very likely that the united
       states and kai in the americas cup final performance.

• Monolingual translator 6
       The Colombian team and the United States are very likely to end up
       in the Americas Cup as the final performance.

• Monolingual translator 8
       The next match against Colombia is likely to be the United States’
       and Keller’s final performance in the current Copa America.

Philipp Koehn                  Computer Aided Translation              10 June 2010
69
                        Error Analysis
            (f) Badly muddled machine translation
• Reference
       In the current America’s cup, he has, just as before, been given an
       important job to do by head coach Bradley, but he clearly cannot win
       the match singlehanded. The US team, made up of ”young guards,”...

• Machine translation
       He is still being head coach bradley appointed to important, it’s even
       a fist ”, four young guards at the beginning of the ”, the united states
       is...



Philipp Koehn                  Computer Aided Translation              10 June 2010
70
                             Conclusions
• Main findings
  – monolingual translators may be as good as bilinguals
70
                             Conclusions
• Main findings
  – monolingual translators may be as good as bilinguals
  – widely different performance by translator / story
70
                             Conclusions
• Main findings
  – monolingual translators may be as good as bilinguals
  – widely different performance by translator / story
  – named entity translation critically important
70
                             Conclusions
• Main findings
  – monolingual translators may be as good as bilinguals
  – widely different performance by translator / story
  – named entity translation critically important

• Various human factors important
  – domain knowledge
70
                             Conclusions
• Main findings
  – monolingual translators may be as good as bilinguals
  – widely different performance by translator / story
  – named entity translation critically important

• Various human factors important
  – domain knowledge
  – language skills
70
                              Conclusions
• Main findings
   – monolingual translators may be as good as bilinguals
   – widely different performance by translator / story
   – named entity translation critically important

• Various human factors important
   – domain knowledge
   – language skills
   – effort




Philipp Koehn                Computer Aided Translation     10 June 2010
71
                                 Outlook
• More assistance
  – named entity transliteration
  – word-level confidence measures
  – show syntactic structure [Albrecht et al., 2009]
71
                                  Outlook
• More assistance
   – named entity transliteration
   – word-level confidence measures
   – show syntactic structure [Albrecht et al., 2009]




Philipp Koehn                 Computer Aided Translation   10 June 2010
72
                     Try it at home!



                http://www.caitra.org/


                       questions?



Philipp Koehn         Computer Aided Translation   10 June 2010
73
                       Further Analysis


• How does the assistance change translator behaviour?

• How do translators utilize assistance?

• How is the translation produced?




Philipp Koehn            Computer Aided Translation      10 June 2010
74
                                   Keystroke Log




                      black: keystroke, purple: deletion, grey: cursor move
                                red: sentence completion accept
                               orange: click on translation option


  Analysis: Segment into periods of activity: typing, tabbing, clicking, pauses
                one second before and after a keystroke is part of typing interval

Philipp Koehn                       Computer Aided Translation                       10 June 2010
75
                Activities: Native French User L1b
 User: L1b            total   init-p   end-p     short-p       mid-p   big-p   key    click    tab
 Unassisted           7.7s     1.3s     0.1s      0.3s         1.8s    1.9s    2.3s     -        -
 Postedit             4.5s     1.5s     0.4s      0.1s         1.0s    0.4s    1.1s     -        -
 Options              4.5s     0.6s     0.1s      0.4s         0.9s    0.7s    1.5s   0.4s       -
 Prediction           2.7s     0.3s     0.3s      0.2s         0.7s    0.1s    0.6s     -      0.4s
 Prediction+Options   4.8s     0.6s     0.4s      0.4s         1.3s    0.5s    0.9s   0.5s     0.2s




Philipp Koehn                     Computer Aided Translation                          10 June 2010
76
                Activities: Native French User L1b
 User: L1b            total   init-p   end-p     short-p       mid-p   big-p   key    click    tab
 Unassisted           7.7s     1.3s     0.1s      0.3s         1.8s    1.9s    2.3s     -        -
 Postedit             4.5s     1.5s     0.4s      0.1s         1.0s    0.4s    1.1s     -        -
 Options              4.5s     0.6s     0.1s      0.4s         0.9s    0.7s    1.5s   0.4s       -
 Prediction           2.7s     0.3s     0.3s      0.2s         0.7s    0.1s    0.6s     -      0.4s
 Prediction+Options   4.8s     0.6s     0.4s      0.4s         1.3s    0.5s    0.9s   0.5s     0.2s


                                                                                 Slighly less
                                                                                 time spent
                                                                                  on typing




Philipp Koehn                     Computer Aided Translation                          10 June 2010
77
                Activities: Native French User L1b
 User: L1b            total   init-p   end-p     short-p       mid-p   big-p   key    click    tab
 Unassisted           7.7s     1.3s     0.1s      0.3s         1.8s    1.9s    2.3s     -        -
 Postedit             4.5s     1.5s     0.4s      0.1s         1.0s    0.4s    1.1s     -        -
 Options              4.5s     0.6s     0.1s      0.4s         0.9s    0.7s    1.5s   0.4s       -
 Prediction           2.7s     0.3s     0.3s      0.2s         0.7s    0.1s    0.6s     -      0.4s
 Prediction+Options   4.8s     0.6s     0.4s      0.4s         1.3s    0.5s    0.9s   0.5s     0.2s


                                        Less                                     Slighly less
                                       pausing                                   time spent
                                                                                  on typing




Philipp Koehn                     Computer Aided Translation                          10 June 2010
78
                Activities: Native French User L1b
 User: L1b            total   init-p   end-p     short-p       mid-p   big-p   key    click    tab
 Unassisted           7.7s     1.3s     0.1s      0.3s         1.8s    1.9s    2.3s     -        -
 Postedit             4.5s     1.5s     0.4s      0.1s         1.0s    0.4s    1.1s     -        -
 Options              4.5s     0.6s     0.1s      0.4s         0.9s    0.7s    1.5s   0.4s       -
 Prediction           2.7s     0.3s     0.3s      0.2s         0.7s    0.1s    0.6s     -      0.4s
 Prediction+Options   4.8s     0.6s     0.4s      0.4s         1.3s    0.5s    0.9s   0.5s     0.2s


                                        Less                     Especially      Slighly less
                                       pausing                   less time       time spent
                                                                   in big         on typing
                                                                  pauses




Philipp Koehn                     Computer Aided Translation                          10 June 2010
79
                Activities: Native English User L2e

 User: L2e            total   init-p   end-p     short-p       mid-p   big-p   key    click    tab
 Unassisted           2.8s     0.3s     0.2s      0.2s         0.3s    0.1s    1.9s     -        -
 Postedit             2.6s     0.4s     0.3s      0.2s         1.0s    0.1s    0.7s     -        -
 Options              3.5s     0.1s     0.3s      0.4s         0.6s    0.2s    1.7s   0.1s       -
 Prediction           2.8s     0.1s     0.3s      0.3s         0.3s      -     1.4s     -      0.3s
 Prediction+Options   3.0s     0.1s     0.3s      0.2s         0.5s      -     1.9s     -        -




Philipp Koehn                     Computer Aided Translation                          10 June 2010
80
                Activities: Native English User L2e

 User: L2e            total   init-p   end-p     short-p       mid-p   big-p   key    click    tab
 Unassisted           2.8s     0.3s     0.2s      0.2s         0.3s    0.1s    1.9s     -        -
 Postedit             2.6s     0.4s     0.3s      0.2s         1.0s    0.1s    0.7s     -        -
 Options              3.5s     0.1s     0.3s      0.4s         0.6s    0.2s    1.7s   0.1s       -
 Prediction           2.8s     0.1s     0.3s      0.3s         0.3s      -     1.4s     -      0.3s
 Prediction+Options   3.0s     0.1s     0.3s      0.2s         0.5s      -     1.9s     -        -


                                                                                      Little time
                                                                                       spent on
                                                                                      assistance




Philipp Koehn                     Computer Aided Translation                          10 June 2010
81
                Activities: Native English User L2e

 User: L2e            total   init-p   end-p     short-p       mid-p   big-p   key    click    tab
 Unassisted           2.8s     0.3s     0.2s      0.2s         0.3s    0.1s    1.9s     -        -
 Postedit             2.6s     0.4s     0.3s      0.2s         1.0s    0.1s    0.7s     -        -
 Options              3.5s     0.1s     0.3s      0.4s         0.6s    0.2s    1.7s   0.1s       -
 Prediction           2.8s     0.1s     0.3s      0.3s         0.3s      -     1.4s     -      0.3s
 Prediction+Options   3.0s     0.1s     0.3s      0.2s         0.5s      -     1.9s     -        -


    Does not use both                                                                 Little time
         assistances,                                                                  spent on
   little overall change                                                              assistance




Philipp Koehn                     Computer Aided Translation                          10 June 2010
82
                Activities: Native English User L2e

 User: L2e            total   init-p    end-p    short-p       mid-p   big-p   key    click    tab
 Unassisted           2.8s     0.3s      0.2s     0.2s         0.3s    0.1s    1.9s     -        -
 Postedit             2.6s     0.4s      0.3s     0.2s         1.0s    0.1s    0.7s     -        -
 Options              3.5s     0.1s      0.3s     0.4s         0.6s    0.2s    1.7s   0.1s       -
 Prediction           2.8s     0.1s      0.3s     0.3s         0.3s      -     1.4s     -      0.3s
 Prediction+Options   3.0s     0.1s      0.3s     0.2s         0.5s      -     1.9s     -        -

                                               Postediting:
    Does not use both                       less typing (-1.2s)                       Little time
         assistances,                  more medium pauses (+0.7s)                      spent on
   little overall change                                                              assistance




Philipp Koehn                     Computer Aided Translation                          10 June 2010
Origin of Characters: Native French L1b 83


                User: L1b              key        click    tab    mt
                Postedit               18%          -        -   81%
                Options                59%        40%        -     -
                Prediction             14%          -      85%     -
                Prediction+Options     21%        44%      33%     -




Philipp Koehn                 Computer Aided Translation               10 June 2010
Origin of Characters: Native French L1b 84


                User: L1b              key        click    tab    mt
                Postedit               18%          -        -   81%
                Options                59%        40%        -     -
                Prediction             14%          -      85%     -
                Prediction+Options     21%        44%      33%     -

                                  Translation comes to large
                                    degree from assistance




Philipp Koehn                 Computer Aided Translation               10 June 2010
Origin of Characters: Native English L2e85


                User: L2e               key       click    tab    mt
                Postedit               20%          -        -   79%
                Options                77%        22%        -     -
                Prediction             61%          -      38%     -
                Prediction+Options     100%         -        -     -




Philipp Koehn                 Computer Aided Translation               10 June 2010
Origin of Characters: Native English L2e86


                User: L2e               key       click    tab    mt
                Postedit               20%          -        -   79%
                Options                77%        22%        -     -
                Prediction             61%          -      38%     -
                Prediction+Options     100%         -        -     -

                                     Although hardly any time
                                        spent on assistance,
                                     fair amount of characters
                                           produced by it



Philipp Koehn                 Computer Aided Translation               10 June 2010
87
                pPauses: French-Native User L1bp




Philipp Koehn             Computer Aided Translation   10 June 2010
88
                pPauses: English-Native User L2ep




Philipp Koehn             Computer Aided Translation   10 June 2010
89
                    Outlook: More analysis
• What do translators think about when they are pausing?
89
                    Outlook: More analysis
• What do translators think about when they are pausing?

• What are the hard problems?
  – unknown words
  – words without direct translation
  – syntactic re-arrangement
89
                     Outlook: More analysis
• What do translators think about when they are pausing?

• What are the hard problems?
   – unknown words
   – words without direct translation
   – syntactic re-arrangement

• What do translators change in post-editing?




Philipp Koehn                 Computer Aided Translation   10 June 2010
90
                 Outlook: More experiments
• Different types of users
  – experienced professional translators
  – volunteer / amateur
  – no/little knowledge of source language
90
                 Outlook: More experiments
• Different types of users
  – experienced professional translators
  – volunteer / amateur
  – no/little knowledge of source language

• Different types of language pairs
  – target-side morphology a problem
  – large-scale reordering maybe a problem
90
                  Outlook: More experiments
• Different types of users
  – experienced professional translators
  – volunteer / amateur
  – no/little knowledge of source language

• Different types of language pairs
  – target-side morphology a problem
  – large-scale reordering maybe a problem

• Different types of translation tasks
  – familiar content for translator?
  – very similar to previously translated text?
90
                   Outlook: More experiments
• Different types of users
   – experienced professional translators
   – volunteer / amateur
   – no/little knowledge of source language

• Different types of language pairs
   – target-side morphology a problem
   – large-scale reordering maybe a problem

• Different types of translation tasks
   – familiar content for translator?
   – very similar to previously translated text?

Philipp Koehn                  Computer Aided Translation   10 June 2010
91
                    Interactive Post-Editing?
• word alignment to source

• confidence estimation of likely faulty parts

• integration with translation memory




Philipp Koehn                 Computer Aided Translation   10 June 2010
92
                pPauses: Unassistedp




Philipp Koehn       Computer Aided Translation   10 June 2010
93
                Pauses: Options




Philipp Koehn    Computer Aided Translation   10 June 2010
94
        Pauses: Prediction of sentence completion




Philipp Koehn         Computer Aided Translation   10 June 2010
95
                Pauses: Postediting




Philipp Koehn      Computer Aided Translation   10 June 2010

Más contenido relacionado

La actualidad más candente

The process of translation
The process of translationThe process of translation
The process of translationAnnastasya Tasya
 
Types of translation
Types of translationTypes of translation
Types of translationAshish Pal
 
Journalism Translation and Journalism.pptx
Journalism Translation and Journalism.pptxJournalism Translation and Journalism.pptx
Journalism Translation and Journalism.pptxGailan1
 
Introduction to Translation
Introduction to TranslationIntroduction to Translation
Introduction to TranslationMohammed Raiyah
 
Types of corpus linguistics Parallel ,aligned...
 Types of corpus linguistics Parallel ,aligned... Types of corpus linguistics Parallel ,aligned...
Types of corpus linguistics Parallel ,aligned...RajpootBhatti5
 
Translation Types
Translation TypesTranslation Types
Translation TypesElena Shapa
 
TRANSLATION, POWER & IDEOLOGY
TRANSLATION, POWER & IDEOLOGYTRANSLATION, POWER & IDEOLOGY
TRANSLATION, POWER & IDEOLOGYAdila Maryam
 
Translation Studies
Translation StudiesTranslation Studies
Translation StudiesArdiansyah -
 
semantics and pragmatics (1)
semantics and pragmatics (1)semantics and pragmatics (1)
semantics and pragmatics (1)ramazan demirtas
 
Types of machine translation
Types of machine translationTypes of machine translation
Types of machine translationRushdi Shams
 
Development of translation theory (ling)
Development of translation theory (ling)Development of translation theory (ling)
Development of translation theory (ling)Henni Herawati
 
Translation and Interpretation
Translation and InterpretationTranslation and Interpretation
Translation and Interpretationwendo1513
 
Globalization and translation
Globalization and translationGlobalization and translation
Globalization and translationPankaj Dwivedi
 
Introduction to Psycholinguistics
Introduction to PsycholinguisticsIntroduction to Psycholinguistics
Introduction to PsycholinguisticsDr. Mohsin Khan
 

La actualidad más candente (20)

The process of translation
The process of translationThe process of translation
The process of translation
 
Types of translation
Types of translationTypes of translation
Types of translation
 
Journalism Translation and Journalism.pptx
Journalism Translation and Journalism.pptxJournalism Translation and Journalism.pptx
Journalism Translation and Journalism.pptx
 
Introduction to Translation
Introduction to TranslationIntroduction to Translation
Introduction to Translation
 
Types of corpus linguistics Parallel ,aligned...
 Types of corpus linguistics Parallel ,aligned... Types of corpus linguistics Parallel ,aligned...
Types of corpus linguistics Parallel ,aligned...
 
Translation
TranslationTranslation
Translation
 
Translation
TranslationTranslation
Translation
 
Translation Types
Translation TypesTranslation Types
Translation Types
 
TRANSLATION, POWER & IDEOLOGY
TRANSLATION, POWER & IDEOLOGYTRANSLATION, POWER & IDEOLOGY
TRANSLATION, POWER & IDEOLOGY
 
Translation studies
Translation studiesTranslation studies
Translation studies
 
Translation Studies
Translation StudiesTranslation Studies
Translation Studies
 
semantics and pragmatics (1)
semantics and pragmatics (1)semantics and pragmatics (1)
semantics and pragmatics (1)
 
Types of machine translation
Types of machine translationTypes of machine translation
Types of machine translation
 
Development of translation theory (ling)
Development of translation theory (ling)Development of translation theory (ling)
Development of translation theory (ling)
 
Translation and Interpretation
Translation and InterpretationTranslation and Interpretation
Translation and Interpretation
 
Globalization and translation
Globalization and translationGlobalization and translation
Globalization and translation
 
The role of translation in globalization
The role of translation in globalizationThe role of translation in globalization
The role of translation in globalization
 
Translation theory
Translation theoryTranslation theory
Translation theory
 
Introduction to Psycholinguistics
Introduction to PsycholinguisticsIntroduction to Psycholinguistics
Introduction to Psycholinguistics
 
Applied linguistics
Applied linguisticsApplied linguistics
Applied linguistics
 

Destacado

Machine Translation And Computer Assisted Translation
Machine Translation And Computer Assisted TranslationMachine Translation And Computer Assisted Translation
Machine Translation And Computer Assisted TranslationTeritaa
 
Machine Translation: What it is?
Machine Translation: What it is?Machine Translation: What it is?
Machine Translation: What it is?Multilizer
 
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
 The Theory and Practice of Computer Aided Translation Training System, Liu Q... The Theory and Practice of Computer Aided Translation Training System, Liu Q...
The Theory and Practice of Computer Aided Translation Training System, Liu Q...TAUS - The Language Data Network
 
Machine translation with statistical approach
Machine translation with statistical approachMachine translation with statistical approach
Machine translation with statistical approachvini89
 
Machine Translation Approaches and Design Aspects
Machine Translation Approaches and Design AspectsMachine Translation Approaches and Design Aspects
Machine Translation Approaches and Design AspectsIOSR Journals
 
History of translstudies
History of translstudiesHistory of translstudies
History of translstudiesMuhmmad Asif
 
Client server architecture
Client server architectureClient server architecture
Client server architectureBhargav Amin
 
Machine Translation Introduction
Machine Translation IntroductionMachine Translation Introduction
Machine Translation Introductionnlab_utokyo
 
Techniques in translation, computer assisted, machine translation, subtitling...
Techniques in translation, computer assisted, machine translation, subtitling...Techniques in translation, computer assisted, machine translation, subtitling...
Techniques in translation, computer assisted, machine translation, subtitling...Moses Altovar
 

Destacado (15)

Machine Translation And Computer Assisted Translation
Machine Translation And Computer Assisted TranslationMachine Translation And Computer Assisted Translation
Machine Translation And Computer Assisted Translation
 
Machine Translation: What it is?
Machine Translation: What it is?Machine Translation: What it is?
Machine Translation: What it is?
 
Tools for translators: some theory & background
Tools for translators: some theory & backgroundTools for translators: some theory & background
Tools for translators: some theory & background
 
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
 The Theory and Practice of Computer Aided Translation Training System, Liu Q... The Theory and Practice of Computer Aided Translation Training System, Liu Q...
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
 
Machine translation with statistical approach
Machine translation with statistical approachMachine translation with statistical approach
Machine translation with statistical approach
 
Machine Translation Approaches and Design Aspects
Machine Translation Approaches and Design AspectsMachine Translation Approaches and Design Aspects
Machine Translation Approaches and Design Aspects
 
Machine translation
Machine translationMachine translation
Machine translation
 
History of translstudies
History of translstudiesHistory of translstudies
History of translstudies
 
Client server architecture
Client server architectureClient server architecture
Client server architecture
 
Machine Translation
Machine TranslationMachine Translation
Machine Translation
 
Machine Translation Introduction
Machine Translation IntroductionMachine Translation Introduction
Machine Translation Introduction
 
translation
translationtranslation
translation
 
History of translation studies
History of translation studiesHistory of translation studies
History of translation studies
 
Machine Tanslation
Machine TanslationMachine Tanslation
Machine Tanslation
 
Techniques in translation, computer assisted, machine translation, subtitling...
Techniques in translation, computer assisted, machine translation, subtitling...Techniques in translation, computer assisted, machine translation, subtitling...
Techniques in translation, computer assisted, machine translation, subtitling...
 

Último

Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsP&CO
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Serviceritikaroy0888
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLSeo
 
HONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael HawkinsHONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael HawkinsMichael W. Hawkins
 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageMatteo Carbone
 
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...noida100girls
 
Unlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdfUnlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdfOnline Income Engine
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Dipal Arora
 
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...anilsa9823
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdfRenandantas16
 
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature SetCreating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature SetDenis Gagné
 
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Delhi Call girls
 
Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMRavindra Nath Shukla
 
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
Tech Startup Growth Hacking 101  - Basics on Growth MarketingTech Startup Growth Hacking 101  - Basics on Growth Marketing
Tech Startup Growth Hacking 101 - Basics on Growth MarketingShawn Pang
 
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...Any kyc Account
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Servicediscovermytutordmt
 
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyThe Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyEthan lee
 
Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Neil Kimberley
 
Grateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfGrateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfPaul Menig
 

Último (20)

Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and pains
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Service
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
 
HONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael HawkinsHONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael Hawkins
 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usage
 
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...
 
Unlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdfUnlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdf
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
 
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
 
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature SetCreating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
 
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
 
Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSM
 
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
Tech Startup Growth Hacking 101  - Basics on Growth MarketingTech Startup Growth Hacking 101  - Basics on Growth Marketing
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
 
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Service
 
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyThe Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
 
Nepali Escort Girl Kakori \ 9548273370 Indian Call Girls Service Lucknow ₹,9517
Nepali Escort Girl Kakori \ 9548273370 Indian Call Girls Service Lucknow ₹,9517Nepali Escort Girl Kakori \ 9548273370 Indian Call Girls Service Lucknow ₹,9517
Nepali Escort Girl Kakori \ 9548273370 Indian Call Girls Service Lucknow ₹,9517
 
Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023
 
Grateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfGrateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdf
 

Computer Aided Translation

  • 1. Computer Aided Translation Philipp Koehn 10 June 2010 Philipp Koehn Computer Aided Translation 10 June 2010
  • 2. 1 Overview • Volunteer Translation Projects • Machine Translation • Human Translation • Assistance to Human Translators • User Study 1 • User Study 2 Philipp Koehn Computer Aided Translation 10 June 2010
  • 3. 2 Overview • Volunteer Translation Projects • Machine Translation • Human Translation • Assistance to Human Translators • User Study 1 • User Study 2 Philipp Koehn Computer Aided Translation 10 June 2010
  • 4. 3 Crowd Sourcing vs. Volunteers • Successful volunteer collaboration translation projects (initiated by a growing communities of self-interested participants) – DE-News – Chinese translations of Guardian etc. – dotSUB
  • 5. 3 Crowd Sourcing vs. Volunteers • Successful volunteer collaboration translation projects (initiated by a growing communities of self-interested participants) – DE-News – Chinese translations of Guardian etc. – dotSUB • Successful crowd sourcing translation projects (initiated by an organization with a translation need) – Google localization – TED translations Philipp Koehn Computer Aided Translation 10 June 2010
  • 6. 4 DE-News • Project – transcription of German radio headline news – translation into English – about 5-10 stories per day, 1993-2003, http://www.germnews.de/ • Motivation – initially Germans abroad wanted to stay informed about events in Germany – also non-German speakers who were interested in Germany – no lack of translators (mostly Germans), but of news gatherers – mostly altruistic: interested in practicing language skills? • used for statistical machine translation: 1 million word parallel corpus collected in 2002 Philipp Koehn Computer Aided Translation 10 June 2010
  • 7. 5 Chinese Translations of Guardian • Project – largest open translation community in China, launched in 2006 – 90,000 contributors, 5,000 ”community translators”, 30,000 translations – motivation: make English content available to Chinese readers – http://www.yeeyan.org/ • Guardian translation project – official collaboration with British Guardian news paper – Dec 2009: translation of Guardian articles ”closed down by the Chinese authorities” Philipp Koehn Computer Aided Translation 10 June 2010
  • 8. 6 dotSUB • Project – subtitling and translation platform, launched in 2007 – ”upload your video, add sub titles, translate subtitles” – easy user interface, open to anybody – service used by TED talks for their translations • Content – guides to Wikis, RSS, Twitter, ... – documentations – political opinion pieces Philipp Koehn Computer Aided Translation 10 June 2010
  • 9. 7 Overview • Volunteer Translation Projects • Machine Translation • Human Translation • Assistance to Human Translators • User Study 1 • User Study 2 Philipp Koehn Computer Aided Translation 10 June 2010
  • 10. 8 Statistical Machine Translation • Learning from data (sentence-aligned translated texts) German documents English documents ....Bank.................. ....bank.................. ..........Bank............ ..........bench.......... ........Bank.............. ........bank.............. .................Bank..... .................bank..... p(bank|Bank) = 0.75, p(bench|Bank) = 0.25 • New machine translation systems can be built automatically Philipp Koehn Computer Aided Translation 10 June 2010
  • 11. 9 Phrase-Based Translation • Foreign input is segmented in phrases – any sequence of words, not necessarily linguistically motivated • Each phrase is translated into English • Phrases are reordered Philipp Koehn Computer Aided Translation 10 June 2010
  • 12. 10 Translation Options er geht ja nicht nach hause he is yes not after house it are is do not to home , it goes , of course does not according to chamber , he go , is not in at home it is not home he will be is not under house it goes does not return home he goes do not do not is to are following is after all not after does not to not is not are not is not a • Many translation options to choose from Philipp Koehn Computer Aided Translation 10 June 2010
  • 13. 11 Translation Options er geht ja nicht nach hause he is yes not after house it are is do not to home , it goes , of course does not according to chamber , he go is not in at home it is not home he will be is not under house it goes does not return home he goes do not do not is to are following is after all not after does not to not is not are not is not a • Many translation options to choose from Philipp Koehn Computer Aided Translation 10 June 2010
  • 14. 12 Decoding Process: Find Best Path er geht ja nicht nach hause yes he goes home are does not go home it to Philipp Koehn Computer Aided Translation 10 June 2010
  • 15. 13 Why Machine Translation? Assimilation — reader initiates translation, wants to know content • user is tolerant of inferior quality • focus of majority of research (GALE program, etc.)
  • 16. 13 Why Machine Translation? Assimilation — reader initiates translation, wants to know content • user is tolerant of inferior quality • focus of majority of research (GALE program, etc.) Communication — participants don’t speak same language, rely on translation • users can ask questions, when something is unclear • chat room translations, hand-held devices • often combined with speech recognition, IWSLT campaign
  • 17. 13 Why Machine Translation? Assimilation — reader initiates translation, wants to know content • user is tolerant of inferior quality • focus of majority of research (GALE program, etc.) Communication — participants don’t speak same language, rely on translation • users can ask questions, when something is unclear • chat room translations, hand-held devices • often combined with speech recognition, IWSLT campaign Dissemination — publisher wants to make content available in other languages • high demands for quality • currently almost exclusively done by human translators Philipp Koehn Computer Aided Translation 10 June 2010
  • 18. 14 Why Machine Translation? Assimilation — reader initiates translation, wants to know content • user is tolerant of inferior quality • focus of majority of research (GALE program, etc.) Communication — participants don’t speak same language, rely on translation • users can ask questions, when something is unclear • chat room translations, hand-held devices • often combined with speech recognition, IWSLT campaign Dissemination — publisher wants to make content available in other languages • high demands for quality OUR • currently almost exclusively done by human translators FOCUS Philipp Koehn Computer Aided Translation 10 June 2010
  • 19. 15 Goal: Helping Human Translators If you can’t beat them, join them.
  • 20. 15 Goal: Helping Human Translators If you can’t beat them, join them. • How can machine translation help human translators?
  • 21. 15 Goal: Helping Human Translators If you can’t beat them, join them. • How can machine translation help human translators? • First question: What do translators do? Philipp Koehn Computer Aided Translation 10 June 2010
  • 22. 16 Overview • Volunteer Translation Projects • Machine Translation • Human Translation • Assistance to Human Translators • User Study 1 • User Study 2 Philipp Koehn Computer Aided Translation 10 June 2010
  • 23. 17 Setup • 10 students at the University of Edinburgh – half native French speakers – half native English speakers with advanced French
  • 24. 17 Setup • 10 students at the University of Edinburgh – half native French speakers – half native English speakers with advanced French • Each student translated – news stories – French-English – about 40 sentences – easy task: familiar content, no specialized terminology • Keystroke log Philipp Koehn Computer Aided Translation 10 June 2010
  • 25. 18 Keystroke Log Input: Au premier semestre, l’avionneur a livr 97 avions. Output: The manufacturer has delivered 97 planes during the first half. (37.5 sec, 3.4 sec/word) black: keystroke, purple: deletion, grey: cursor move height: length of sentence Philipp Koehn Computer Aided Translation 10 June 2010
  • 26. 19 Analysis • We can observe – slow typing
  • 27. 19 Analysis • We can observe – slow typing – fast typing
  • 28. 19 Analysis • We can observe – slow typing – fast typing – pauses
  • 29. 19 Analysis • We can observe – slow typing – fast typing – pauses • Pauses – beginning pause: reading the input sentence – final pause: reviewing the translation
  • 30. 19 Analysis • We can observe – slow typing – fast typing – pauses • Pauses – beginning pause: reading the input sentence – final pause: reviewing the translation – short pauses (2-6 seconds): hesitation – medium pauses (6-60 seconds): problem solving – big pauses (>60 seconds): serious problem Philipp Koehn Computer Aided Translation 10 June 2010
  • 31. 20 Time Spent on Activities Pauses User total initial final short medium big keystroke L1a 3.3s 0.1s 0.1s 0.2s 1.0s 0.1s 1.8s L1b 7.7s 1.3s 0.1s 0.3s 1.8s 1.9s 2.3s L1c 3.9s 0.2s 0.2s 0.3s 0.7s - 2.5s L1d 2.8s 0.2s 0.0s 0.2s 0.4s 0.1s 1.8s L1e 5.2s 0.3s 0.0s 0.3s 1.9s 0.5s 2.2s L2a 5.7s 0.5s 0.1s 0.3s 1.8s 0.7s 2.2s L2b 3.2s 0.1s 0.1s 0.2s 0.4s 0.1s 2.2s L2c 5.8s 0.3s 0.2s 0.5s 1.5s 0.3s 3.1s L2d 3.4s 0.7s 0.1s 0.3s 0.6s - 1.8s L2e 2.8s 0.3s 0.2s 0.2s 0.3s 0.1s 1.9s L1 = native French, L2 = native English average time per input word Philipp Koehn Computer Aided Translation 10 June 2010
  • 32. 21 Time Spent on Activities not much time Pauses User total initial final short medium big keystroke L1a 3.3s 0.1s 0.1s 0.2s 1.0s 0.1s 1.8s L1b 7.7s 1.3s 0.1s 0.3s 1.8s 1.9s 2.3s L1c 3.9s 0.2s 0.2s 0.3s 0.7s - 2.5s L1d 2.8s 0.2s 0.0s 0.2s 0.4s 0.1s 1.8s L1e 5.2s 0.3s 0.0s 0.3s 1.9s 0.5s 2.2s L2a 5.7s 0.5s 0.1s 0.3s 1.8s 0.7s 2.2s L2b 3.2s 0.1s 0.1s 0.2s 0.4s 0.1s 2.2s L2c 5.8s 0.3s 0.2s 0.5s 1.5s 0.3s 3.1s L2d 3.4s 0.7s 0.1s 0.3s 0.6s - 1.8s L2e 2.8s 0.3s 0.2s 0.2s 0.3s 0.1s 1.9s L1 = native French, L2 = native English average time per input word Philipp Koehn Computer Aided Translation 10 June 2010
  • 33. 22 Time Spent on Activities not much time Pauses similar User total initial final short medium big keystroke L1a 3.3s 0.1s 0.1s 0.2s 1.0s 0.1s 1.8s L1b 7.7s 1.3s 0.1s 0.3s 1.8s 1.9s 2.3s L1c 3.9s 0.2s 0.2s 0.3s 0.7s - 2.5s L1d 2.8s 0.2s 0.0s 0.2s 0.4s 0.1s 1.8s L1e 5.2s 0.3s 0.0s 0.3s 1.9s 0.5s 2.2s L2a 5.7s 0.5s 0.1s 0.3s 1.8s 0.7s 2.2s L2b 3.2s 0.1s 0.1s 0.2s 0.4s 0.1s 2.2s L2c 5.8s 0.3s 0.2s 0.5s 1.5s 0.3s 3.1s L2d 3.4s 0.7s 0.1s 0.3s 0.6s - 1.8s L2e 2.8s 0.3s 0.2s 0.2s 0.3s 0.1s 1.9s L1 = native French, L2 = native English average time per input word Philipp Koehn Computer Aided Translation 10 June 2010
  • 34. 23 Time Spent on Activities not much time Pauses differences similar User total initial final short medium big keystroke L1a 3.3s 0.1s 0.1s 0.2s 1.0s 0.1s 1.8s L1b 7.7s 1.3s 0.1s 0.3s 1.8s 1.9s 2.3s L1c 3.9s 0.2s 0.2s 0.3s 0.7s - 2.5s L1d 2.8s 0.2s 0.0s 0.2s 0.4s 0.1s 1.8s L1e 5.2s 0.3s 0.0s 0.3s 1.9s 0.5s 2.2s L2a 5.7s 0.5s 0.1s 0.3s 1.8s 0.7s 2.2s L2b 3.2s 0.1s 0.1s 0.2s 0.4s 0.1s 2.2s L2c 5.8s 0.3s 0.2s 0.5s 1.5s 0.3s 3.1s L2d 3.4s 0.7s 0.1s 0.3s 0.6s - 1.8s L2e 2.8s 0.3s 0.2s 0.2s 0.3s 0.1s 1.9s L1 = native French, L2 = native English average time per input word Philipp Koehn Computer Aided Translation 10 June 2010
  • 35. 24 Overview • Volunteer Translation Projects • Machine Translation • Human Translation • Assistance to Human Translators • User Study 1 • User Study 2 Philipp Koehn Computer Aided Translation 10 June 2010
  • 36. Related Work: Tools used by Translators25 • Translators often use standard text editors and additional tools • Bilingual dictionary • Spell checker, grammar checker • Monolingual concordancer • Terminology database • Web search to establish and verify meaning of terms Philipp Koehn Computer Aided Translation 10 June 2010
  • 37. 26 Translation Memory • Source: This feature is available for free in the QX 3400. • Fuzzy match in translation memory: This feature is available for free in the QX 3200. Diese Funktion ist kostenlos im Modell QX 3200 verf¨gbar. u • Translator inspects the fuzzy match and uses it in her translation. Philipp Koehn Computer Aided Translation 10 June 2010
  • 38. 27 Bilingual Concordancer show translations in context (www.linguee.com) Philipp Koehn Computer Aided Translation 10 June 2010
  • 39. 28 Our Types of Assistance • Sentence completion – tool suggests how to complete the translation – one phrase at a time
  • 40. 28 Our Types of Assistance • Sentence completion – tool suggests how to complete the translation – one phrase at a time • Translation options – most likely translations for each word and phrase – ordered and color-highlighted by probability
  • 41. 28 Our Types of Assistance • Sentence completion – tool suggests how to complete the translation – one phrase at a time • Translation options – most likely translations for each word and phrase – ordered and color-highlighted by probability • Postediting machine translation – start with machine translation output – user edits, tool shows changes Philipp Koehn Computer Aided Translation 10 June 2010
  • 42. 29 Technical Notes • Online at http://www.caitra.org/ • User uploads source text, translates one sentence at a time • Implementation – AJAX Web 2.0 using Ruby on Rails, mySQL – Back end: Moses machine translation system Philipp Koehn Computer Aided Translation 10 June 2010
  • 43. 30 Predicting Sentence Completion • Tool makes a suggestion how to continue (in red)
  • 44. 30 Predicting Sentence Completion • Tool makes a suggestion how to continue (in red) • User can accept it (by pressing tab), or type in her own translation
  • 45. 30 Predicting Sentence Completion • Tool makes a suggestion how to continue (in red) • User can accept it (by pressing tab), or type in her own translation • Same idea as TransType, with minor modifications – show only short text chunks, not full sentence completion – show only one suggestion, not alternatives Philipp Koehn Computer Aided Translation 10 June 2010
  • 46. 31 How does it work? • Uses search graph of SMT decoding
  • 47. 31 How does it work? • Uses search graph of SMT decoding • Matches partial user translation against search graph, by optimizing 1. minimal string edit distance between path in graph and user translation 2. best full path probability, including best completion to end
  • 48. 31 How does it work? • Uses search graph of SMT decoding • Matches partial user translation against search graph, by optimizing 1. minimal string edit distance between path in graph and user translation 2. best full path probability, including best completion to end • Technical notes – search graph is pre-computed and stored in database – matching is done server-side, typically takes less than 1 second – completion path is returned to client (web brower) Philipp Koehn Computer Aided Translation 10 June 2010
  • 49. 32 Translation Options • For each word and phrases: suggested translations • Ranked (and color-highlighted) by probability • User may click on suggestion → appended to text box Philipp Koehn Computer Aided Translation 10 June 2010
  • 50. Translation Options - How does it work?33 • Uses phrase translation table of SMT system
  • 51. Translation Options - How does it work?33 • Uses phrase translation table of SMT system • Translation score: future cost estimate – e ¯ ¯e conditional probabilities φ(¯|f ), φ(f |¯) – e ¯ ¯e lexical probabilities lex(¯|f ), lex(f |¯) – word count feature – language model estimate
  • 52. Translation Options - How does it work?33 • Uses phrase translation table of SMT system • Translation score: future cost estimate – e ¯ ¯e conditional probabilities φ(¯|f ), φ(f |¯) – e ¯ ¯e lexical probabilities lex(¯|f ), lex(f |¯) – word count feature – language model estimate • Ranking of shorter vs. longer phrases by including outside future cost estimate Philipp Koehn Computer Aided Translation 10 June 2010
  • 53.
  • 54. 35 Postediting Machine Translation • Textbox is initially filled with machine translation • User edits translation • String edit distance to machine translation is shown (blue background) Philipp Koehn Computer Aided Translation 10 June 2010
  • 55. 36 Overview • Volunteer Translation Projects • Machine Translation • Human Translation • Assistance to Human Translators • User Study 1 • User Study 2 Philipp Koehn Computer Aided Translation 10 June 2010
  • 56. 37 Evaluation • Recall setup – 10 students, half native French, half native English – each student translated French-English news stories – about 40 sentences for each condition of assistance
  • 57. 37 Evaluation • Recall setup – 10 students, half native French, half native English – each student translated French-English news stories – about 40 sentences for each condition of assistance • Five different conditions – unassisted – prediction (sentence completion) – options – predictions and options – post-editing Philipp Koehn Computer Aided Translation 10 June 2010
  • 58. 38 Quality • We want faster translators, but not worse • Assessment of translation quality – show translations to bilingual judges, with source – judgment: fully correct? yes/no Indicate whether each user’s input represents a fully fluent and meaning-equivalent translation of the source. The source is shown with context, the actual sentence is bold.
  • 59. 38 Quality • We want faster translators, but not worse • Assessment of translation quality – show translations to bilingual judges, with source – judgment: fully correct? yes/no Indicate whether each user’s input represents a fully fluent and meaning-equivalent translation of the source. The source is shown with context, the actual sentence is bold. • Average score: 50% correct — lower than expected – judges seemed to be too harsh – when given several translations, tendency to judge half as bad Philipp Koehn Computer Aided Translation 10 June 2010
  • 60. 39 Example of Quality Judgments Src. Sans se d´monter, il s’est montr´ concis et pr´cis. e e e MT Without dismantle, it has been concise and accurate. 1/3 Without fail, he has been concise and accurate. (Prediction+Options, L2a) 4/0 Without getting flustered, he showed himself to be concise and precise. (Unassisted, L2b) 4/0 Without falling apart, he has shown himself to be concise and accurate. (Postedit, L2c) 1/3 Unswayable, he has shown himself to be concise and to the point. (Options, L2d) 0/4 Without showing off, he showed himself to be concise and precise. (Prediction, L2e) 1/3 Without dismantling himself, he presented himself consistent and precise. (Prediction+Options, L1a) 2/2 He showed himself concise and precise. (Unassisted, L1b) 3/1 Nothing daunted, he has been concise and accurate. (Postedit, L1c) 3/1 Without losing face, he remained focused and specific. (Options, L1d) 3/1 Without becoming flustered, he showed himself concise and precise. (Prediction, L1e) Philipp Koehn Computer Aided Translation 10 June 2010
  • 61. 40 Faster and Better Assistance Speed Quality Unassisted 4.4s/word 47% correct Postedit 2.7s (-1.7s) 55% (+8%) Options 3.7s (-0.7s) 51% (+4%) Prediction 3.2s (-1.2s) 54% (+7%) Prediction+Options 3.3s (-1.1s) 53% (+6%) Philipp Koehn Computer Aided Translation 10 June 2010
  • 62. 41 Faster and Better, Mostly User Unassisted Postedit Options Prediction Prediction+Options L1a 3.3sec/word 1.2s -2.2s 2.3s -1.0s 1.1s -2.2s 2.4s -0.9s 23% correct 39% +16%) 45% +22% 30% +7%) 44% +21% L1b 7.7sec/word 4.5s -3.2s) 4.5s -3.3s 2.7s -5.1s 4.8s -3.0s 35% correct 48% +13% 55% +20% 61% +26% 41% +6% L1c 3.9sec/word 1.9s -2.0s 3.8s -0.1s 3.1s -0.8s 2.5s -1.4s 50% correct 61% +11% 54% +4% 64% +14% 61% +11% L1d 2.8sec/word 2.0s -0.7s 2.9s (+0.1s) 2.4s (-0.4s) 1.8s -1.0s 38% correct 46% +8% 59% (+21%) 37% (-1%) 45% +7% L1e 5.2sec/word 3.9s -1.3s 4.9s (-0.2s) 3.5s -1.7s 4.6s (-0.5s) 58% correct 64% +6% 56% (-2%) 62% +4% 56% (-2%) L2a 5.7sec/word 1.8s -3.9s 2.5s -3.2s 2.7s -3.0s 2.8s -2.9s 16% correct 50% +34% 34% +18% 40% +24% 50% +34% L2b 3.2sec/word 2.8s (-0.4s) 3.5s +0.3s 6.0s +2.8s 4.6s +1.4s 64% correct 56% (-8%) 60% -4% 61% -3% 57% -7% L2c 5.8sec/word 2.9s -3.0s 4.6s (-1.2s) 4.1s -1.7s 2.7s -3.1s 52% correct 53% +1% 37% (-15%) 59% +7% 53% +1% L2d 3.4sec/word 3.1s (-0.3s) 4.3s (+0.9s) 3.8s (+0.4s) 3.7s (+0.3s) 49% correct 49% (+0%) 51% (+2%) 53% (+4%) 58% (+9%) L2e 2.8sec/word 2.6s -0.2s 3.5s +0.7s 2.8s (-0.0s) 3.0s +0.2s 68% correct 79% +11% 59% -9% 64% (-4%) 66% -2% avg. 4.4sec/word 2.7s -1.7s 3.7s -0.7s 3.2s -1.2s 3.3s -1.1s 47% correct 55% +8% 51% +4% 54% +7% 53% +6% Philipp Koehn Computer Aided Translation 10 June 2010
  • 63. 42 Slow Users 1: Faster and Better 8s 2b 7s • Unassisted 6s – more than 5 seconds per input word 1a – very bad (35%, 16%) 5s + E O • With assistance 4s – much faster and better 3s P + P – reaching roughly average performance O 2s E 1s 10% 20% 30% 40% 50% 60% Philipp Koehn Computer Aided Translation 10 June 2010
  • 64. 43 Slow Users 2: Only Faster 8s 7s • Unassisted 6s – more than 5 seconds per input word 1c – average quality 2e 5s O O + 4s P • With assistance E P – faster and but not better 3s E + 2s 1s 30% 40% 50% 60% Philipp Koehn Computer Aided Translation 10 June 2010
  • 65. 44 Fast Users 4s 2c O 2a 3s P + 2s E +O E 1s P 10% 20% 30% 40% 50% 60% 70% 80% • Unassisted – fast: 3-4 seconds per input word – L1a is very bad (23%), L1c is average (50%) • With assistance – faster and better – L1a closer to average (30-45%), L1c becomes very good (54-61%) Philipp Koehn Computer Aided Translation 10 June 2010
  • 66. 45 Refuseniks 4s 1d 1b 3s E 2d E 1e E 2s E 1s 10% 20% 30% 40% 50% 60% 70% 80% • Use the assistance sparingly or not at all, and see generally no gains • The two best translators are in this group • Postediting – mixed on quality (2 better, 1 worse, 1 same), but all faster – best translator (L2e, 68%) becomes much better (record 79%) Philipp Koehn Computer Aided Translation 10 June 2010
  • 67. 46 Learning Curve users become better over time with assistance Philipp Koehn Computer Aided Translation 10 June 2010
  • 68. 47 User Feedback • Q: In which of the five conditions did you think you were most accurate? – predictions+options: 5 users – options: 2 users – prediction: 1 user – postediting: 1 user
  • 69. 47 User Feedback • Q: In which of the five conditions did you think you were most accurate? – predictions+options: 5 users – options: 2 users – prediction: 1 user – postediting: 1 user • Q: Rank the different types of assistance on a scale from 1 to 5, where1 indicates not at all and 5 indicates very helpful. – prediction+options: 4.6 – prediction: 3.9 – options: 3.7 – postediting: 2.9
  • 70. 47 User Feedback • Q: In which of the five conditions did you think you were most accurate? – predictions+options: 5 users – options: 2 users – prediction: 1 user – postediting: 1 user • Q: Rank the different types of assistance on a scale from 1 to 5, where1 indicates not at all and 5 indicates very helpful. – prediction+options: 4.6 – prediction: 3.9 – options: 3.7 – postediting: 2.9 Philipp Koehn Computer Aided Translation 10 June 2010
  • 71. User Feedback 48 • Q: In which of the five conditions did you think you were most accurate? – predictions+options: 5 users – options: 2 users – prediction: 1 user – postediting: 1 user • Q: Rank the different types of assistance on a scale from 1 to 5, where1 indicates not at all and 5 indicates very helpful. – prediction+options: 4.6 – prediction: 3.9 – options: 3.7 – postediting: 2.9 • Note: does not match empirical results Philipp Koehn Computer Aided Translation 10 June 2010
  • 72. 49 Summary • Assistance made translators faster – average speed improvement from 4.4s/word to 2.7-3.7s/word – reduction of big pauses – reduction of typing effort in post-editing
  • 73. 49 Summary • Assistance made translators faster – average speed improvement from 4.4s/word to 2.7-3.7s/word – reduction of big pauses – reduction of typing effort in post-editing • Assistance made translators better – average judgment increased from 47% to 51-55% with help – even good translators get better with postediting
  • 74. 49 Summary • Assistance made translators faster – average speed improvement from 4.4s/word to 2.7-3.7s/word – reduction of big pauses – reduction of typing effort in post-editing • Assistance made translators better – average judgment increased from 47% to 51-55% with help – even good translators get better with postediting • Some good translators ignored the assistance
  • 75. 49 Summary • Assistance made translators faster – average speed improvement from 4.4s/word to 2.7-3.7s/word – reduction of big pauses – reduction of typing effort in post-editing • Assistance made translators better – average judgment increased from 47% to 51-55% with help – even good translators get better with postediting • Some good translators ignored the assistance • Fastest and (barely) best with postediting, but did not like it Philipp Koehn Computer Aided Translation 10 June 2010
  • 76. 50 Overview • Volunteer Translation Projects • Machine Translation • Human Translation • Assistance to Human Translators • User Study 1 • User Study 2 Philipp Koehn Computer Aided Translation 10 June 2010
  • 77. 51 Experiment • Monolingual translators – 10 students/staff at the University of Edinburgh – none knew Arabic or Chinese – have access to full stories at a time, may correct prior sentences
  • 78. 51 Experiment • Monolingual translators – 10 students/staff at the University of Edinburgh – none knew Arabic or Chinese – have access to full stories at a time, may correct prior sentences • Bilingual translators – 3 of the 4 reference translations in NIST test set • Remaining reference translation as truth Philipp Koehn Computer Aided Translation 10 June 2010
  • 79. 52 Stories Story Headline Sent. Words 1: chi White House Pushes for Nuclear Inspectors to Be Sent as Soon 6 207 as Possible to Monitor North Korea’s Closure of Its Nuclear Reactors 2: chi Torrential Rains Hit Western India, 43 People Dead 10 204 3: chi Research Shows a Link between Arrhythmia and Two Forms 7 247 of Genetic Variation 4: chi Veteran US Goalkeeper Keller May Retire after America’s Cup 10 367 5: ara Britain: Arrests in Several Cities and Explosion of Suspicious 7 224 Car 6: ara Ban Ki-Moon Withdraws His Report on the Sahara after 8 310 Controversy Surrounding Its Content 7: ara Pakistani Opposition Leaders Call on Musharraf to Resign. 11 312 8: ara Al-Maliki: Iraqi Forces Are Capable of Taking Over the 8 255 Security Dossier Any Time They Want Philipp Koehn Computer Aided Translation 10 June 2010
  • 80. 53 Results: Arabic 60 50 40 30 20 10 0 mono1 mono2 mono3 mono4 mono5 mono6 mono7 mono8 mono9 mono10 percentage of sentences judged as correct Philipp Koehn Computer Aided Translation 10 June 2010
  • 81. 54 Results: Arabic 80 Arabic 70 60 50 40 30 20 10 0 bi1 bi2 bi3 mono1 mono2 mono3 mono4 mono5 mono6 mono7 mono8 mono9 mono10 compared to bilingual translators Philipp Koehn Computer Aided Translation 10 June 2010
  • 82. 55 Results: Arabic 80 70 60 50 40 30 20 10 0 bi1 bi2 bi3 mono1 mono2 mono3 mono4 mono5 mono6 mono7 mono8 mono9 mono10 best monolinguals as good as worst bilingual Philipp Koehn Computer Aided Translation 10 June 2010
  • 83. 56 Results: Arabic and Chinese 80 Arabic Chinese 70 60 50 40 30 20 10 0 bi1 bi2 bi3 mono1 mono2 mono3 mono4 mono5 mono6 mono7 mono8 mono9 mono10 mostly worse performance for Chinese Philipp Koehn Computer Aided Translation 10 June 2010
  • 84. 57 Results per Story 80 Bilingual Mono Post-Edit 70 60 50 40 30 20 10 0 Chinese Weather Chinese Sports Arabic Diplomacy Arabic Politics Chinese Politics Chinese Science Arabic Terror Arabic Politics performance differs widely per story Philipp Koehn Computer Aided Translation 10 June 2010
  • 85. 58 Results per Story 80 Bilingual Mono Post-Edit 70 60 50 40 30 20 10 0 Chinese Weather Chinese Sports Arabic Diplomacy Arabic Politics Chinese Politics Chinese Science Arabic Terror Arabic Politics one story: monolinguals as good as bilinguals Philipp Koehn Computer Aided Translation 10 June 2010
  • 86. 59 Offering more assistance • Progress in computer aided translation
  • 87. 59 Offering more assistance • Progress in computer aided translation • Interactive machine translation (TransType) – show prediction of sentence completion – recompute when user types own translation
  • 88. 59 Offering more assistance • Progress in computer aided translation • Interactive machine translation (TransType) – show prediction of sentence completion – recompute when user types own translation • Alternative translations [Koehn and Haddow, 2009] – display translation options from translation model – ranked by translation score Philipp Koehn Computer Aided Translation 10 June 2010
  • 89. 60 Translation Options up to 10 translations for each word / phrase Philipp Koehn Computer Aided Translation 10 June 2010
  • 90. 61 Translation Options Philipp Koehn Computer Aided Translation 10 June 2010
  • 91. 62 Results with Options 80 Bilingual Mono Post-Edit 70 Mono Options 60 50 40 30 20 10 0 Chinese Weather Chinese Sports Arabic Diplomacy Arabic Politics Chinese Politics Chinese Science Arabic Terror Arabic Politics no big difference — once significantly better Philipp Koehn Computer Aided Translation 10 June 2010
  • 92. 63 Error Analysis (a) Critical Judges • Reference Torrential Rains Hit Western India, 43 People Dead • Bilingual translator Heavy Rains Plague Western India Leaving 43 Dead Philipp Koehn Computer Aided Translation 10 June 2010
  • 93. 64 Error Analysis (b) Mistakes by the professional translators • Reference Over just two days on the 29th and 30th, rainfall in Mumbai reached 243 mm. • Bilingual translator The rainfall in Mumbai had reached 243 cm over the two days of the 29th and 30th alone. Philipp Koehn Computer Aided Translation 10 June 2010
  • 94. 65 Error Analysis (b2) Domain knowledge vs. language skills • Bilingual translator With Munchen-Gladbach falling to the German Bundesliga 2, ... • Monolingual translator The M¨nchengladbach team fell into the second German league, ... o Philipp Koehn Computer Aided Translation 10 June 2010
  • 95. 66 Error Analysis (c) Bad English by monolingual translators • Monolingual translator The western region of india heavy rain killed 43 people. Philipp Koehn Computer Aided Translation 10 June 2010
  • 96. 67 Error Analysis (d) Mistranslated / untranslated name • Reference Johndroe said that the two leaders ... • Machine translation Strong zhuo, pointing out that the two presidents ... • Monolingual translator Qiang Zhuo pointed out that the two presidents ... Philipp Koehn Computer Aided Translation 10 June 2010
  • 97. 68 Error Analysis (e) Wrong relationship between entities • Machine translation The colombian team for the match, and it is very likely that the united states and kai in the americas cup final performance. • Monolingual translator 6 The Colombian team and the United States are very likely to end up in the Americas Cup as the final performance. • Monolingual translator 8 The next match against Colombia is likely to be the United States’ and Keller’s final performance in the current Copa America. Philipp Koehn Computer Aided Translation 10 June 2010
  • 98. 69 Error Analysis (f) Badly muddled machine translation • Reference In the current America’s cup, he has, just as before, been given an important job to do by head coach Bradley, but he clearly cannot win the match singlehanded. The US team, made up of ”young guards,”... • Machine translation He is still being head coach bradley appointed to important, it’s even a fist ”, four young guards at the beginning of the ”, the united states is... Philipp Koehn Computer Aided Translation 10 June 2010
  • 99. 70 Conclusions • Main findings – monolingual translators may be as good as bilinguals
  • 100. 70 Conclusions • Main findings – monolingual translators may be as good as bilinguals – widely different performance by translator / story
  • 101. 70 Conclusions • Main findings – monolingual translators may be as good as bilinguals – widely different performance by translator / story – named entity translation critically important
  • 102. 70 Conclusions • Main findings – monolingual translators may be as good as bilinguals – widely different performance by translator / story – named entity translation critically important • Various human factors important – domain knowledge
  • 103. 70 Conclusions • Main findings – monolingual translators may be as good as bilinguals – widely different performance by translator / story – named entity translation critically important • Various human factors important – domain knowledge – language skills
  • 104. 70 Conclusions • Main findings – monolingual translators may be as good as bilinguals – widely different performance by translator / story – named entity translation critically important • Various human factors important – domain knowledge – language skills – effort Philipp Koehn Computer Aided Translation 10 June 2010
  • 105. 71 Outlook • More assistance – named entity transliteration – word-level confidence measures – show syntactic structure [Albrecht et al., 2009]
  • 106. 71 Outlook • More assistance – named entity transliteration – word-level confidence measures – show syntactic structure [Albrecht et al., 2009] Philipp Koehn Computer Aided Translation 10 June 2010
  • 107. 72 Try it at home! http://www.caitra.org/ questions? Philipp Koehn Computer Aided Translation 10 June 2010
  • 108. 73 Further Analysis • How does the assistance change translator behaviour? • How do translators utilize assistance? • How is the translation produced? Philipp Koehn Computer Aided Translation 10 June 2010
  • 109. 74 Keystroke Log black: keystroke, purple: deletion, grey: cursor move red: sentence completion accept orange: click on translation option Analysis: Segment into periods of activity: typing, tabbing, clicking, pauses one second before and after a keystroke is part of typing interval Philipp Koehn Computer Aided Translation 10 June 2010
  • 110. 75 Activities: Native French User L1b User: L1b total init-p end-p short-p mid-p big-p key click tab Unassisted 7.7s 1.3s 0.1s 0.3s 1.8s 1.9s 2.3s - - Postedit 4.5s 1.5s 0.4s 0.1s 1.0s 0.4s 1.1s - - Options 4.5s 0.6s 0.1s 0.4s 0.9s 0.7s 1.5s 0.4s - Prediction 2.7s 0.3s 0.3s 0.2s 0.7s 0.1s 0.6s - 0.4s Prediction+Options 4.8s 0.6s 0.4s 0.4s 1.3s 0.5s 0.9s 0.5s 0.2s Philipp Koehn Computer Aided Translation 10 June 2010
  • 111. 76 Activities: Native French User L1b User: L1b total init-p end-p short-p mid-p big-p key click tab Unassisted 7.7s 1.3s 0.1s 0.3s 1.8s 1.9s 2.3s - - Postedit 4.5s 1.5s 0.4s 0.1s 1.0s 0.4s 1.1s - - Options 4.5s 0.6s 0.1s 0.4s 0.9s 0.7s 1.5s 0.4s - Prediction 2.7s 0.3s 0.3s 0.2s 0.7s 0.1s 0.6s - 0.4s Prediction+Options 4.8s 0.6s 0.4s 0.4s 1.3s 0.5s 0.9s 0.5s 0.2s Slighly less time spent on typing Philipp Koehn Computer Aided Translation 10 June 2010
  • 112. 77 Activities: Native French User L1b User: L1b total init-p end-p short-p mid-p big-p key click tab Unassisted 7.7s 1.3s 0.1s 0.3s 1.8s 1.9s 2.3s - - Postedit 4.5s 1.5s 0.4s 0.1s 1.0s 0.4s 1.1s - - Options 4.5s 0.6s 0.1s 0.4s 0.9s 0.7s 1.5s 0.4s - Prediction 2.7s 0.3s 0.3s 0.2s 0.7s 0.1s 0.6s - 0.4s Prediction+Options 4.8s 0.6s 0.4s 0.4s 1.3s 0.5s 0.9s 0.5s 0.2s Less Slighly less pausing time spent on typing Philipp Koehn Computer Aided Translation 10 June 2010
  • 113. 78 Activities: Native French User L1b User: L1b total init-p end-p short-p mid-p big-p key click tab Unassisted 7.7s 1.3s 0.1s 0.3s 1.8s 1.9s 2.3s - - Postedit 4.5s 1.5s 0.4s 0.1s 1.0s 0.4s 1.1s - - Options 4.5s 0.6s 0.1s 0.4s 0.9s 0.7s 1.5s 0.4s - Prediction 2.7s 0.3s 0.3s 0.2s 0.7s 0.1s 0.6s - 0.4s Prediction+Options 4.8s 0.6s 0.4s 0.4s 1.3s 0.5s 0.9s 0.5s 0.2s Less Especially Slighly less pausing less time time spent in big on typing pauses Philipp Koehn Computer Aided Translation 10 June 2010
  • 114. 79 Activities: Native English User L2e User: L2e total init-p end-p short-p mid-p big-p key click tab Unassisted 2.8s 0.3s 0.2s 0.2s 0.3s 0.1s 1.9s - - Postedit 2.6s 0.4s 0.3s 0.2s 1.0s 0.1s 0.7s - - Options 3.5s 0.1s 0.3s 0.4s 0.6s 0.2s 1.7s 0.1s - Prediction 2.8s 0.1s 0.3s 0.3s 0.3s - 1.4s - 0.3s Prediction+Options 3.0s 0.1s 0.3s 0.2s 0.5s - 1.9s - - Philipp Koehn Computer Aided Translation 10 June 2010
  • 115. 80 Activities: Native English User L2e User: L2e total init-p end-p short-p mid-p big-p key click tab Unassisted 2.8s 0.3s 0.2s 0.2s 0.3s 0.1s 1.9s - - Postedit 2.6s 0.4s 0.3s 0.2s 1.0s 0.1s 0.7s - - Options 3.5s 0.1s 0.3s 0.4s 0.6s 0.2s 1.7s 0.1s - Prediction 2.8s 0.1s 0.3s 0.3s 0.3s - 1.4s - 0.3s Prediction+Options 3.0s 0.1s 0.3s 0.2s 0.5s - 1.9s - - Little time spent on assistance Philipp Koehn Computer Aided Translation 10 June 2010
  • 116. 81 Activities: Native English User L2e User: L2e total init-p end-p short-p mid-p big-p key click tab Unassisted 2.8s 0.3s 0.2s 0.2s 0.3s 0.1s 1.9s - - Postedit 2.6s 0.4s 0.3s 0.2s 1.0s 0.1s 0.7s - - Options 3.5s 0.1s 0.3s 0.4s 0.6s 0.2s 1.7s 0.1s - Prediction 2.8s 0.1s 0.3s 0.3s 0.3s - 1.4s - 0.3s Prediction+Options 3.0s 0.1s 0.3s 0.2s 0.5s - 1.9s - - Does not use both Little time assistances, spent on little overall change assistance Philipp Koehn Computer Aided Translation 10 June 2010
  • 117. 82 Activities: Native English User L2e User: L2e total init-p end-p short-p mid-p big-p key click tab Unassisted 2.8s 0.3s 0.2s 0.2s 0.3s 0.1s 1.9s - - Postedit 2.6s 0.4s 0.3s 0.2s 1.0s 0.1s 0.7s - - Options 3.5s 0.1s 0.3s 0.4s 0.6s 0.2s 1.7s 0.1s - Prediction 2.8s 0.1s 0.3s 0.3s 0.3s - 1.4s - 0.3s Prediction+Options 3.0s 0.1s 0.3s 0.2s 0.5s - 1.9s - - Postediting: Does not use both less typing (-1.2s) Little time assistances, more medium pauses (+0.7s) spent on little overall change assistance Philipp Koehn Computer Aided Translation 10 June 2010
  • 118. Origin of Characters: Native French L1b 83 User: L1b key click tab mt Postedit 18% - - 81% Options 59% 40% - - Prediction 14% - 85% - Prediction+Options 21% 44% 33% - Philipp Koehn Computer Aided Translation 10 June 2010
  • 119. Origin of Characters: Native French L1b 84 User: L1b key click tab mt Postedit 18% - - 81% Options 59% 40% - - Prediction 14% - 85% - Prediction+Options 21% 44% 33% - Translation comes to large degree from assistance Philipp Koehn Computer Aided Translation 10 June 2010
  • 120. Origin of Characters: Native English L2e85 User: L2e key click tab mt Postedit 20% - - 79% Options 77% 22% - - Prediction 61% - 38% - Prediction+Options 100% - - - Philipp Koehn Computer Aided Translation 10 June 2010
  • 121. Origin of Characters: Native English L2e86 User: L2e key click tab mt Postedit 20% - - 79% Options 77% 22% - - Prediction 61% - 38% - Prediction+Options 100% - - - Although hardly any time spent on assistance, fair amount of characters produced by it Philipp Koehn Computer Aided Translation 10 June 2010
  • 122. 87 pPauses: French-Native User L1bp Philipp Koehn Computer Aided Translation 10 June 2010
  • 123. 88 pPauses: English-Native User L2ep Philipp Koehn Computer Aided Translation 10 June 2010
  • 124. 89 Outlook: More analysis • What do translators think about when they are pausing?
  • 125. 89 Outlook: More analysis • What do translators think about when they are pausing? • What are the hard problems? – unknown words – words without direct translation – syntactic re-arrangement
  • 126. 89 Outlook: More analysis • What do translators think about when they are pausing? • What are the hard problems? – unknown words – words without direct translation – syntactic re-arrangement • What do translators change in post-editing? Philipp Koehn Computer Aided Translation 10 June 2010
  • 127. 90 Outlook: More experiments • Different types of users – experienced professional translators – volunteer / amateur – no/little knowledge of source language
  • 128. 90 Outlook: More experiments • Different types of users – experienced professional translators – volunteer / amateur – no/little knowledge of source language • Different types of language pairs – target-side morphology a problem – large-scale reordering maybe a problem
  • 129. 90 Outlook: More experiments • Different types of users – experienced professional translators – volunteer / amateur – no/little knowledge of source language • Different types of language pairs – target-side morphology a problem – large-scale reordering maybe a problem • Different types of translation tasks – familiar content for translator? – very similar to previously translated text?
  • 130. 90 Outlook: More experiments • Different types of users – experienced professional translators – volunteer / amateur – no/little knowledge of source language • Different types of language pairs – target-side morphology a problem – large-scale reordering maybe a problem • Different types of translation tasks – familiar content for translator? – very similar to previously translated text? Philipp Koehn Computer Aided Translation 10 June 2010
  • 131. 91 Interactive Post-Editing? • word alignment to source • confidence estimation of likely faulty parts • integration with translation memory Philipp Koehn Computer Aided Translation 10 June 2010
  • 132. 92 pPauses: Unassistedp Philipp Koehn Computer Aided Translation 10 June 2010
  • 133. 93 Pauses: Options Philipp Koehn Computer Aided Translation 10 June 2010
  • 134. 94 Pauses: Prediction of sentence completion Philipp Koehn Computer Aided Translation 10 June 2010
  • 135. 95 Pauses: Postediting Philipp Koehn Computer Aided Translation 10 June 2010