Overview of Multidimensional Quality Metrics (QTLaunchPad)

Translation Quality Assessment:
Five Easy Steps
Using Multidimensional Quality Metrics to
improve quality assessment and management
Prepared by the QTLaunchPad project (info@qt21.eu)
version 1.0 (26.April 2013)

Who does this apply to?
 Requesters of translation services looking for relevant
quality metrics
 Language Service Providers (LSPs) delivering translation
services to their clients
 The following materials will apply to negotiation
between requesters and providers
 This description does not apply to individual translators
(although they may want to be aware of the contents)

Basic questions about your project
E.g.,
 What languages are you working in?
 What is your subject field?
 What sort of project is it (e.g., user
interface, documentation, advertising)?
 What technology are you using (MT, CAT, etc.)?
 What register and style are you using?

Based on your specifications…
 MQM recommendation tool will:
 suggest a pre-defined metric used for similar projects, or
 recommend a custom metric that applies to your project
 You are free to modify the metric as needed
 Create a metrics specification file that
 defines the issues to be examined
 provides weights (descriptions of how important the issues
are)
 Metrics specification file can be used by an MQM-
compliant tool

Three options:
1. Sampling: Examine a portion of the text to determine
whether to pass or fail the entire text. Sampling can
utilize quality estimation for better results
2. Full error analysis: Review the entire text (needed for
critical legal or safety texts)
3. Rubric: Rate the text on a numerical scale (suitable
for quick assessment of suitability)

Automated Metrics
 If sampling is used, MQM’s quality estimation tools will
help focus sampling on those parts of the text that need
attention
 Automatic metrics can be used in some cases where
human evaluation is too expensive or time-consuming

Evaluation…
 Can be conducted by the requester or LSP in accordance
with the agreement between the parties
 Follows the method chosen in Step 3 (evaluation
method)
 Issues must match the metric chosen in Step 2: issues
not found in the metric should not be considered errors

MQM provides capabilities
 For human evaluation
 Inline markup provides an audit trail:
 Allows independent verification of errors
 Helps ensure that issues are corrected
 Full reporting functions:
 See what types of errors are reported
 Understand where errors come from
 For automatic evaluation
 Integrated use of existing quality metrics to help provide
evaluation

translate5
 These capabilities are being integrated into an open-
source editing tool, translate5
(http://www.translate5.net)
 All results are free to implement in additional tools
(both open source and proprietary)
 Parties interested in development should contact
info@qt21.eu

The source matters
 Full MQM evaluation includes the source
 Source quality evaluation can help identify reasons for
problems and resolve them
 Translators can be rewarded for addressing source
deficiencies (scores over 100% are possible!)

Scoring Formula
 (Q = whatever set of issues being counted within the bigger
formula)
 Provides consistency with LISA QA Model scoring method
 Can be customized to support other legacy systems
 Can be applied to individual parts of the overall formula:
i.e., fluency, accuracy, grammar, etc. subscores can be
derived
 Weights (not shown) can be used to adjust importance of
various issue types

Scores help guide decisions
 Scores are given on a 100% basis
 Scores can be broken down into more fine-grained
reports.
 E.g., a score of 96% could have 100% accuracy but 92%
fluency.
 Helps target actions for quality control.

1. Specifications
Parameter Value
Language/Locale Source: English; Target: Japanese
Subject field/domain Medical
Text type Narrative
Audience Educated readers with an interest in medicine
Purpose Education about a new procedure for managing
diabetes
Register Moderately formal
Style no specified style – match source if possible
Content correspondence Literal translation
Output modality subtitles (speech to text)
File format Time-coded XML for dotSub
Production technology human translation

2. Recommended Metric
Issue type Weight (high,
medium, low)
Notes
Fluency
Orthography High
Grammar High
Accuracy
Mistranslation High
Omission Low Due to nature as captions,
some information loss is
expected. Captions should be
60% of spoken dialogue
Untranslated High
Legal
requirements
High Must make sure that legal
claims are admissible under
Japanese law

Chosen from…
Issue types are a subset of the full catalog of types

Quality Formula (1)
TQ = (Atr + At - As) + (Ft – Fs)
with respect to specifications
 TQ = translation quality
 Atr = accuracy (transfer)
 At = accuracy for the target text
 As = accuracy for the source text
 Ft = fluency score for target text
 Fs = fluency score for source text

Quality Formula (2)
TQ = (Atr + At - As) + (Ft – Fs)
with respect to specifications
Definition: A quality translation demonstrates required
accuracy and fluency for the audience and purpose and
complies with all other negotiated specifications, taking
into account end-user needs.
The gold portion = dimensions (specifications)

3. Evaluation method
 In this example, portions of the text are marketing:
sampling is an acceptable evaluation method for these
parts
 Other portions contain legal and regulatory claims: full
error analysis is required for those portions
 Inline markup can be used via MQM namespace (because
text is in XML) to ensure corrections are made.

4. Evaluation
• Evaluation includes
subsegment markup
with issues in metric
• Issues stored in MQM
namespace to allow
audit and revision
• Users can select three severity levels:
• critical: the issue renders the text unusable
• major: the issue leaves the text usable, but is an obstacle to
understanding
• minor: the issue does not impact usability of the text
screenshot: translate5.net showing MQM markup tool

5. Scoring
Issue type Weight Minor Major Critical Penalty Adjusted Total
Fluency
Orthography 1.0 8 2 1 28 28 97.2%
Grammar 1.0 6 2 0 16 16 98.4%
Subtotal 44 95.6%
Accuracy
Mistranslation 1.0 4 0 0 4 4 99.6%
Omission 0.2 12 4 1 42 8.4 99.2%
Untranslated 1.0 1 0 0 1 1 99.9%
Legal
requirements
1.0 0 0 1 10 10 99.0%
Subtotal 23.4 97.7%
Total 67.4 93.3%
Assumes 1000-word sample
Because Omission is considered a low
priority in this case, it is given a low
weight

5. Scoring
 Without weighting of Omission, the score would be
89.9%
 We can see that the translator has more problems with
fluency than with accuracy

5. Full scoring (including source)
Issue type Source Target Adjusted
Fluency
Orthography 96.1% 97.2% 101.1%
Grammar 99.0% 98.4% 99.6%
Subtotal 95.1% 95.6% ☞ 100.5%
Accuracy
Mistranslation (100%) 99.6% 99.6%
Omission (100%) 99.2% 99.2%
Untranslated (100%) 99.9% 99.9%
Legal requirements (100%) 99.0% 99.0%
Subtotal 100% 97.7% 97.7%
Total 95.1% 89.9% 98.2%
Assumes 1000-word sample. Source accuracy set to 100% for computational purposes.

5. Scoring (including source)
 In many cases, some problems in a translation are not
caused by the translator.
 In this case, the translator fixed problems in the
source, resulting in better quality for fluency in the
target. The translator should be recognized for this
work.

For more information
 Please visit http://www.qt21.eu/launchpad/
 Write to info@qt21.eu

Overview of Multidimensional Quality Metrics (QTLaunchPad)

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (17)

Destacado

Destacado (13)

Similar a Overview of Multidimensional Quality Metrics (QTLaunchPad)

Similar a Overview of Multidimensional Quality Metrics (QTLaunchPad) (20)

Último

Último (20)

Overview of Multidimensional Quality Metrics (QTLaunchPad)