IAC 2024 - IA Fast Track to Search Focused AI Solutions
Language International 8.3 (1996)
1. Dear Editor
The article by Dr William J. Niven on machine-friendliness and reader-friendliness in machine translation
in the 7/6 (December, 1995) issue of Language International has urged me to submit to Dr Niven a few
remarks.
Dr Niven is perfectly right in maintaining that highly formatted text is used in technical writing to
induce the reader to read more slowly or draw the reader's attention to some part of the text. As a
general rule, however, good formatting serves MT purposes as well as reading and human translation,
while bad formatting produces poor results even from the most brilliant of human translators. Page and
text layout could and should be modified when changes in the layout are functional to translation,
intended in this case as a peculiar kind of documentation localization. With the principle of real
localization in mind, reader-friendly could eventually be equal to machine-friendly and vice versa as
easiness in reading for a human being could mean easiness in reading even for a machine, and a document
being easy to read for a machine is definitely easy to read even for a human being.
In this perspective it is not always true that the tabular format, for example, provides more logical
structuring and clearer distribution of information. Technical writers use tables very cautiously as they are
difficult to read even for a well-educated human being. With respect to MT's own difficulties in reading
formatted text, this problem does not pertain to MT only and, in some way, this could explain why
technical writers are increasingly shifting to SGML. SGML and mark-up languages in general have the
unique capability of describing information in a software-independent format, and all major word
processors are now capable of formatting text using mark-up techniques, thus making it easier to separate
text from formatting codes. In addition, the algorithm for tag isolation and shedding should be quite
simple, hence format decoding should not be a titanic endeavor. Analysis and pre-analysis could be highly
improved by effective use of spelling and syntax and grammar checkers which technical writers actually
scarcely use, while controlled language could be used for simplified writing and punctuation usage which
could vary greatly from one culture to another.
Technical writers, however, are rarely aware of what writing for machine translation or with
translatability in mind means. Unfortunately, technical developers do write their own manuals: what
technical writers receive to work on is mostly a document in itself which can be "refined" but not
"rewritten" as is often necessary. Despite the lack of time and inclination, developers go well beyond
technical specification and do not trust technical writers' ability. This attitude comes from the popular
belief that an arts degree corresponds to an inadequate grasping of technical concepts, which in fact are
often technicalities.
Technicalities are, for example, the ambiguities in the reading and interpretation of "fault", "defect",
"error", and "mistake", and controlled language could serve to set the question. But also knowledge based
MT systems could, should they make wider use of thesauri and use descriptors as pointers to subject-area
flags. The use of synonyms, however, should not be a problem when using a controlled language where
one term points to one meaning and terms could possibly be organized in a well-defined structure.
When technical writers are given the proper space to work, they often have to cope with the typical
European attitude towards technical documentation: do not offend the reader (the user, the buyer, the
client) with too simple prose.
As to passive and nominalization, this is a simple expedient to solve the common problem to approach
the reader: German and Italian both use the courtesy form, but directly addressing the reader is still
considered inelegant and clumsy. In Italian, the infinitive is now widely used as a form of indirect
imperative, while nominalization is used when the agent is evident or made clear elsewhere.
Ultimately, a translator is a technical writer alone and an MT system is a translator too. Modem CAT
systems also produce over-literal translations: no machine is capable of converting one culture into
another. This is not a syntax problem, writing is the process of reproducing human thought in a readable
manner.
The spread of English as the technical language has made too many bad translations acceptable and
accepted. Quality in technical translation is no longer a linguistic property, its usability and suitability is.
And this is as true for human translation as for machine translation.
The question of machine-friendliness versus reader-friendliness is misleading. Translators do not like
post-editing because they feel themselves overridden by their clients on behalf of a machine. Technical
writers cannot write with translatability in mind because they are not taught, nor are translators and MT
developers taught to cope with the subtleties of technical writing. But who should teach a technical
writer to write with translatability in mind? And who should teach technical writing to and for translators
to allow them to better understand the technical documents they are called to process? MT developers sit
on an ivory tower in a world apart: their work is not for trivial translators. MT is still academy and this
provides for a good explanation of the success of poor but cheap MT systems for the PC. They can depict
formatting codes, and even if the output translation still reflects the original syntax, even if it is coarse,
garbled and flawed it can tell the user whether the document is worth a costly human translation.