Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Best Practices for Collaborative Translation
1. Wanted: Best Practices for Collaborative
Translation
Alain Désilets
National Research Council of Canada
alain.desilets@nrc-cnrc.gc.ca
With support from:
2. A (Very) Brief History
of Collaborative
Translation
Circa 2005: “Wikis, what's that?”
Circa 2006: “I know about Wikipedia, but I hear it’s garbage because
anyone can write anything on it.”
Circa 2007: “You know, I have been to Wikipedia a couple of times and
was pleasantly surprised by the quality of what I found there.”
Circa 2008: “Actually, this wiki stuff is really interesting. Now, I routinely
use Wikipedia in my work, and although I am cautious with it, I find it
useful.
I have a sense that this wiki/collaborative stuff will have a wider impact
for translation, but I’m not quite sure how.”
3. A (Very) Brief History
of Collaborative
Translation (2)
Circa 2009: “Whoa! Translation Crowdsourcing is going to put me out of a
job!”
Circa 2010: “Well, I guess that was a storm in a teacup. Crowdsourcing
will be used in some specific and limited contexts, but it won't take
over. Maybe this is more an opportunity for us than a threat…”
Circa 2011: “Hum… getting this collaborative translation stuff to work
right is hard and confusing.”
4. Talk Outline
The different “flavors” of Collaborative Translation
Common issues and Challenges in Collaborative
Translation
Capturing Collaborative Translation best-practices
in the form of Design Patterns
7. Definition
Collaborative Translation is the use
of any open online collaboration
technology or process, in order to
help with translation tasks, or tasks
related to translation (ex:
terminology).
8. Available in the
following flavors…
• Translation crowdsourcing
• Collaborative terminology resources
• Translation memory sharing
• Online marketplaces for translators
• Agile translation teamware
• Post-editing by the crowd
9. Translation
Crowdsourcing
Mechanical-Turk-like systems to support the translation of content
by large crowds of mostly amateurs, through an open-call
process.
By far the most talked about collaborative translation approach
• Software user interface (Facebook, Adobe, Symantec, Firefox)
• Technical documentation (Adobe, Symantec)
• Transcripts of videos of an “inspirational” nature (TED Talks, Adobe TV)
• Humanitarian aid content (Translators without Borders, Kiva.org, Haiti
Earthquake Mission)
• Large scale collection of linguistic data for research purposes or machine
translation training (NAACL Workshop on Crowdsourcing).
10. Collaborative
terminology resources
Wikipedia-like platforms for the creation and maintenance of large
terminology resources by a crowd of translators, terminologists,
domain experts, and even general members of the public.
• Wikipedia
• Wiktionary
• ProZ’s Kudoz forum
• Urban Dictionary
• TermWiki.com
• TikiWiki
• Reverso dictionary
11. Translation memory
sharing
Platforms for large scale pooling and sharing of multilingual
parallel corpora between organizations and individuals.
• TAUS Data Association
• MyMemory
• Google Translator Toolkit
• WeBiText
Often, collaboration is “implicit”, for example, in the case of
WeBiText.
12. Online marketplaces
for translators
eBay-like disintermediated environments for connecting customers
and translators directly, with minimal intervention by a middle
man.
• ProZ.com
• TranslatorsCafe
• Translated.net
Collaborative aspects comes from things like “open call sourcing”
and reputation management based on community assessment.
13. Agile translation
teamware
Wiki-like systems and processes that allow multidisciplinary teams
of professionals (translators, terminologists, domain experts,
revisers, managers) to collaborate on large translation projects,
using an agile, grassroots, parallelized process instead of the
more top-down, assembly-line approach found in most
translation workflow systems.
No specific software or site, but many case studies describing how
to implement this approach, using general purpose
collaboration tools like wikis, BaseCamp.
• Beninatto & De Palma, 2008,
• Calvert, 2008
• Yahaya, 2008
Some translation workflow systems starting to market themselves
as being “collaborative”
14. Post-editing by the
crowd
Systems allowing a large crowd of mostly amateurs to correct the
output of machine translations systems, often with the aim of
improving the system’s accuracy.
• Asia Online’s Wikipedia translation project
• Google Translate allows anonymous users to correct the
outputs produced by the systems
• Likewise for Microsoft’s Bing Translator
15. Is this REALLY New?
Weren’t Terminology Databases, Translation Memories and
Translation Workflow Systems already collaborative?
• Yes, but…
• … Collaborative Translation is about using these kinds of
groupware technologies in the context of much larger groups or
communities, where people have fewer reasons to trust each
other a-priori.
It’s one thing to open yourself to collaboration with colleagues and
customers.
It’s quite another thing to open yourself to the whole world.
17. This is NOT easy
Choosing a flavor and tailoring it to your needs is still somewhat of
a black art, guided by trial and error.
There are lots of important and poorly understood issues that
arise, many of which are common to most flavors:
• Alignment with business goals
• Quality control
• Crowd motivation
• Proper role of professionals
18. Alignment with
business goals
Why are you doing this in the first place? Which flavor can deliver
what you want?
The actual benefit you get from a given flavor is not necessarily
what you think!
Translation crowdsourcing
• Reduce cost? – Yes, but not the biggest benefit.
• Decrease lead time? – Definitely.
• Translation more in-tune with target audience’s idiosyncrasies?
-- Also
• Most importantly: Increase brand loyalty by engaging end-users
as co-creators of products, instead of passive consumers.
19. Quality Control
How to control quality when you open yourself to contributions
from a potential large group of “outsiders”?
Many ways:
• Screen contributors before letting them in (ex: Translators
Without Border, Kiva.org).
• Have members of the community vote on the quality of each
other’s work (ex: Facebook, Translated.net).
• Have in-house professionals revise the work done by the
community (ex: Facebook).
20. Quality Control (2)
Do not assume that quality of community-produced content will be
lower.
For instance, Wikipedia provably measures up to professionally
produced encyclopedia like Britannica (English) and Brockhaus
(German).
Quality issues tend to iron themselves out provided that you attract
a sufficient large number of the right people
Wisdom of crowd effects works surprisingly well when the
following conditions are met:
• Diversity
• Independence
• Aggregation
21. Crowd motivation
If you are to attract and retain enough of the “right people” you
need to understand why thy might contribute.
• Mandated by management (ex: Agile Translation teamware)
• Emotional bond with the content (ex: Facebook, and surprisingly, Adobe)
• Prestige of the content (ex: TEDTalks)
• Wanting to do good (ex: Translators Without Border, Kiva, Haiti Earthquake
Mission, Data collection for scientific research)
• Pride in one’s native language (ex: Data collection for R&D in MT for small
density languages)
• Trying to perfect second language skills
• Trying to make a go at professional translation career (ex: Kiva.org)
• And in some cases, $$$
– Will this be the dominant scenario?
– How to set compensation high enough to attract good contributors, but not so
high that it interferes with more intrinsic motivations, or attracts people out to
game the system.
22. Role of professionals
Some flavors of CT are designed specifically for professionals (ex: Agile
translation teamware, Online translator marketplaces).
But some (e.g. Translation crowdsourcing), tend to de-emphasize their role.
When should professionals be involved, and what should be their role?
• Revise work done by amateurs?
– Focus on more challenging aspects of translation like terminology,
style, fluidity?
• Manage and coach the crowd?
• Focus on more mission-critical and hard to translate content?
Translation Crowdsourcing may actually increase the size of the pie, by
making it possible to tackle content and/or small languages that would
otherwise not have been dealt with anyway.
24. Wanted: Best-
Practices
Collaborative Translations presents practitioners with a varied and
complex envelopes of different approaches and technologies.
Selecting a flavor and tuning it to meet your needs is complex.
We need some sort of concise, easy-to-consult repository of best-
practices for that field.
We propose a way to collaboratively create such a repository a
community, in the form of a design patterns language.
26. About Design Patterns
A format for describing a common
solution to a common problem in
a given field
Originally used in Architecture, but
since then adopted in other fields
such as Software Engineering,
Education, etc.
27. Design Patterns
Example
Publish Contributions Rapidly
Context
This pattern is useful for motivating contributors in any collaborative
translation context, but it is particularly useful in translation crowdsourcing
scenarios.
Problem
Contributors are often motivated by a desire to have a positive impact on the
community they are participating in. However, they cannot achieve this
sense of being useful, if their contributions do not become available to the
rest of the community in a reasonable amount of time.
Solution
Therefore, minimize the delay between the moment when a member of the
community contributes to the site, and the moment where it becomes
publicly available to the rest of the community. Ideally, the contribution
should become visible to the rest of the community as soon as the user
clicks on the Save button.
28. Design Patterns
Example (2)
Related patterns
– Point System is another way for a contributor to get a sense of how useful he
has been to the community.
– Campaign Progress Gauge is another practice which allows members of the
community to see the positive impact of their actions. The main difference is
that it operates more at a community/project level rather than at a
individual/contribution level.
Real-life examples
– At Facebook, translations become available in a matter of hours.
– In the context of software localization by the crowd, Adobe makes a conscious
effort to wrap the community's translations into every new releases of the
product.
29. TAUS Roundtable on
Collaborative
Translation
Wiki “Barn Raising” workshop held on October 12th, 2011 at
Localization World in Santa Clara.
12 practitioners
• One third with hands on experience of CT (NRC, Adobe,
Symantec, Kiva, World Wide Lexicon)
• Two thirds with no experience, but a strong interest in trying it
(In Every Language, MemSource, Firma 8, SPIL Games)
Talks by the experienced users about what worked and didn’t.
Followed by brainstorming of what the recurring best-practices
seem to be.
30. The Best-Practices
End result:
=> 50+ best practices organized into 6 themes
Planning and Scoping
Translation as User Engagement, Align Stakeholder Expectations, Early and
Continuous Clarification of Translator Expectations, Backup Plan, Project,
Check Points, Appoint Initial Community Manager, Clear Objectives, Identify
Compatible Content
Community Motivation
Campaign Progress Gauge, Contributor Recognition, Leader Board, Official
Certificate, Point System, Offer Double Points, Hand-Out Unique Branded,
Products, Contributor of the Month, Grant Special Access Rights, Playful
Casual Translation, Campaign, Publish Contributions Rapidly, Playful
Competition Among Contributors
31. The Best-Practices (2)
Quality
Content-Specific Testing, Entry Exam, Peer Review, Automatic Reputation
Management, Random Spot-Checking, Revision Crowdsourcing, Users as
Translators, Voting, Transparent Quality Level, Publish then Revise
Contributor Career Path
Flexible Contributor Career Path, Lurker to Contributor Transition, Anonymous
Translation, Find the Leaders, Support Variable Levels of Involvement,
Community Manager, Content Prioritizer
Right Sizing
Appropriate Chunk Size, Community-Appropriate Project Size, Break Up
Crowd Into Teams, Require Minimal Involvement Level, Keep the Crowd
Small, Volunteer Team Leaders
32. The Best-Practices (3)
Tools and Processes
Hint at Content Priority, First In, First Out, Task Self Selection, Layered
Fallbacks, Official Linguistic Resources, Automatic Suggestions, Provide,
Context, In-Place Translation, Community Forum, Analytics for Content
Prioritization, Simplicity First, Good Examples of Contributions, Encourage
Self, Set Deadlines
33. Some Observations
The bulk of practices relate to Translation Crowdsourcing.
=> We need to spend more time capturing practices for other
flavors of Collaborative Translation
The bulk of the practices so far are not specific to translation.
• They would be useful in the context of crowdsourcing efforts in
any domain.
• Maybe all we need is to codify and/or learn about the best
practices for crowdsourcing in general?
The more similar two organizations are, the more similar their
practices will be (ex: Kiva and TWB, versus Kiva and Adobe).
35. Conclusion
Collaborative Translation presents practitioners with a very large
and varied set of tools and processes.
Choosing a particular flavor of CT and tailoring to meet one’s
needs can be a daunting task.
We need a concise, easy to consult, modular compendium of
current best practices in that area.
We have started building such a compendium in the form of a wiki
site (www.collaborative-translation-patterns.com) which
captures best practices in the form of design patterns.
We invite every one in this room to contribute to it if they can.