Jun 5, 2017

Technology for Legal Translation

Last April I was a guest at the Buenos Aires University Facultad de Derecho, where I had an opportunity to meet students and staff from the law school's integrated degree program for certified public translators and to speak about my use of various technologies to assist my work in legal translation. This post is based loosely on that presentation and a subsequent workshop at the Universidade de Évora.

Useful ideas seldom develop in isolation, and to the extent that I can claim good practice in the use of assistive technologies for my translation work in legal and other domains it is largely the product of my interactions with many colleagues over the past seventeen years of commercial translation activity. These fine people have served as mentors, giving me my first exposure to the concepts of platform interoperability for translation tools, and as inspirations by sharing the many challenges they face in their work and clearly articulating the desired outcomes they hoped to achieve as professionals. They have also generously and frequently shared with me the solutions that they have found and have often unselfishly shared their ideas on how and why we should do better in our daily practice. And I am grateful that I can continue to learn with them, work better, and help others to do so as well.

A variety of tools for information management and transformation can benefit the work of a legal translator in areas which include but are not limited to:
  • corpus utilization,
  • text conversion,
  • terminology management,
  • diverse information retrieval,
  • assisted drafting,
  • dictated speech to text,
  • quality assurance,
  • version control and comparison, and
  • source and target text review.
Though not exhaustive, the list above can provide a fairly comprehensive basis for education of future colleagues and continued professional development for those already active as legal translators. But with any of the technologies discussed below, it is important to remember that the driving force is not the hardware and software we use in technical devices but rather the human mind and its understanding of subject matter and the needs of the particular task or work process in the legal domain. No matter how great our experience, there is always something more and useful to be learned, and often the best way to do this is to discuss the challenges of technology and workflow with others and keep an open mind for new approaches with promise.

Reference texts of many kinds are important in legal translation work (and in other types of translation too, of course). These may be monolingual or multilingual texts, and they provide a wealth of information on subject matter, terminology and typical usage in particular contexts. These collections of text – or corpora – are most useful when the information found in them can be read in context rather than isolation. Translation memories – used by many in our work – are also corpora of a kind, but they are seriously flawed in their usual implementations, because only short segments of text are displayed in a bilingual format, and the meaning and context of these retrieved snippets are too often obscure.

An excerpt from a parallel corpus showing a treaty text in English, Portuguese and Spanish

The best corpus tools for translation work allow concordance searches in multiple selected corpora and provide access to the full context of the information found. Currently, the best example of integrated document context with information searches in a translation environment tool is found in the LiveDocs module of Kilgray's memoQ.

A memoQ concordance search with a link to an "aligned" translation
A past translation and its preview stored in a memoQ LiveDocs corpus, accessed via concordance search
A memoQ LiveDocs corpus has all the advantages of the familiar "translation memory" but can include other information, such as previews of the translated work as well. It is always clear in which document the information "hit" was found, and corpora can also include any number of monolingual documents in source and target languages, something which is not possible with a traditional translation memory.

In many cases, however, much context can be restored to a traditional translation memory by transforming it into a "document" in a LiveDocs corpus. This is because in most cases, substantial portions of the translation memory will have its individual segment records stored in document order; if the content is exported as a TMX file or tab-delimited text file and then imported as a bilingual document in a LiveDocs corpus, the result will be almost as if the original translations had been aligned and saved, and from a concordance hit one can open the bilingual content directly and read the parts before and after the text found in the concordance search.

Legal translation can involve text conversion in a broad sense in many ways. Legal translators must often deal with hardcopy or faxed material or scanned files created from these. Often documents to translate and reference documents are provided in portable document format (PDF), in which finding and editing information can be difficult. Using special software, these texts can be converted into documents which can be edited, and portions can be copied, pasted and overwritten easily, or they can be imported in translation assistance platforms such as SDL Trados Studio, Wordfast or memoQ. (Some of these environments include integrated facilities for converting PDF texts, but the results are seldom as suitable for work as PDF or scanned files converted with optical character recognition software such as ABBYY FineReader or OmniPage.)

Software tools like ABBYY FineReader can also convert "dead" scanned text images into searchable documents. This will even work with bad contrast or color images in the background, making it easier, for example, to look for information in mountains of scanned documents used in legal discovery. Text-on-image files like the example shown above completely preserve the layout and image context of the text to be read in the best way. I first discovered and used this option while writing a report for a client in which I had to reference sections of a very long, scanned policy document from the European Parliament. It was driving me crazy to page through the scanned document to find information I wanted to cite but where I had failed to make notes during my first reading. Converting that scanned policy to a searchable PDF made it easy to find what I needed in seconds and accurately cite its page number, etc. Where there is text on pictures, difficult contrast and other features this is often far better for reference purposes than converting to an MS Word document, for example, where the layouts are likely to become garbled.

Software tools for translation can also make text in many other original formats accessible to translators in an ergonomically simpler form, also ensuring, where necessary, that no text is overlooked because of a complicated layout or because it is in an easily overlooked footnote or margin note. Text import filters in translation environments make it easy to read and translate the words in a uniform working environment, with many reference tools and other help available, and then render the translated text back into its original format or some more useful bilingual format.

An excerpt of translated patent claims exported as a bilingual table for review

Technology also offers many possibilities for identifying, recording and controlling relevant terminology in legal translation work.

Large quantities of text can be analyzed quickly to find the most frequent special vocabulary likely to be relevant to the translation work and save these in project glossaries, often enabling that work to be organized better with much of the clarification of terms taking place prior to translation.  This is particularly valuable in large projects where it may be advisable to ensure that a team of translators all use the same terms in the target language to avoid possible confusion and misunderstanding.

Glossaries created in translation assistance tools can provide terminology hints during work and even save keystrokes when linked to predictive, "intelligent" writing features.

Integrated quality checking features in translation environments enable possible deviations of terminology or other issues to be identified and corrected quickly.

Technical features in working software for translation allow not only desirable terms to be identified and elaborated; they also enable undesired terms to be recorded and avoided. Barred terms can be marked as such while translating or automatically identified in a quality check.

A patent glossary exported from memoQ and then made into a PDF dictionary via SDL Trados MultiTerm
Technical tools enable terminology to be shared in many different ways. Glossaries in appropriate formats can be moved easily between different environments to share them with others on a team which uses diverse technologies; they can also be output as spreadsheets, web pages or even formatted dictionaries (as shown in the example above). This can help to ensure consistency over time in the terms used by translators and attorneys involved in a particular case.

There are also many different ways that terminology can be shared dynamically in a team. Various terminology servers available usually suffer from being restricted to particular platforms, but freely available tools like Google Sheets coupled with web look-up interfaces and linked spreadsheets customized for importing into particular environments can be set up quickly and easily, with access restricted to a selected team.

The links in the screenshot above show a simple example using some data from SAP. There is a master spreadsheet where the data is maintained and several "slavesheets" designed for simple importing into particular translation environment tools. Forms can also be used for simplified data entry and maintenance.

If Google Sheets do not meet the confidentiality requirements of a particular situation, similar solutions can be designed using intranets, extranets, VPNs, etc.

Technical tools for translators can help to locate information in a great variety of environments and media in ways that usually integrate smoothly with their workflow. Some available tools enable glossaries and bilingual corpora to be accessed in any application, including word processors, presentation software and web pages.

Corpus information in translation memories, memoQ LiveDocs or external sources can be looked up automatically or in concordance searches based on whole or partial content matches or specified search terms, and then useful parts can be inserted into the target text to assist translation. In some cases, differences between a current source text and archived information is highlighted to assist in identifying and incorporating changes.

Structured information such as dates, currency expressions, legal citations and bibliographical references can also be prepared for simple keystroke insertion in the translated text or automated quality checking. This can save many frustrating hours of typing and copy revision. In this regard, memoQ currently offers the best options for translation with its "auto-translation" rulesets, but many tools offer rules-based QA facilities for checking structured information.

Voice recognition technologies offer ergonomically superior options for transcription in many languages and can often enable heavy translation workloads with short deadlines to be handled with greater ease, maintaining or even improving text quality. Experienced translators with good subject matter knowledge and voice recognition software skills can typically produce more finished text in a day than the best post-editing operations for machine pseudo-translation, with the exception that the text produced by human voice transcription is actually usable in most situations, while the "gloss" added to machine "translations" is at best lipstick on a pig.

Reviewing a text for errors is hard work, and a pressing deadline to file a brief doesn't make the job easier. Technical tools for translation enable tens of thousands of words of text to be scanned for particular errors in seconds or minutes, ensuring that dates and references are correct and consistent, that correct terminology has been used, et cetera.

The best tools even offer sophisticated tools for tracking changes, differences in source and target text versions, even historical revisions to a translation at the sentence level. And tools like SDL Trados Studio or memoQ enable a translation and its reference corpora to be updated quickly and easily by importing a modified (monolingual) target text.

When time is short and new versions of a source text may follow in quick succession, technology offers possibilities to identify differences quickly, automatically process the parts which remain unchanged and keep everything on track and on schedule.

For all its myriad features, good translation technology cannot replace human knowledge of language and subject matter. Those claiming the contrary are either ignorant or often have a Trumpian disregard for the truth and common sense and are all too eager to relieve their victims of the burdens of excess cash without giving the expected value in exchange.

Technologies which do not assist translation experts to work more efficiently or with less stress in the wide range of challenges found in legal translation work are largely useless. This really does include machine pseudo-translation (MpT). The best “parts” of that swindle are essentially the corpus matching for translation memory archives and corpora found in CAT tools like memoQ or SDL Trados Studio, and what is added is often incorrect and dangerously liable to lead to errors and misinterpretations. There are also documented, damaging effects on one’s use of language when exposed to machine pseudo-translation for extended periods.

Legal translation professionals today can benefit in many ways from technology to work better and faster, but the basis for this remains what it was ten, twenty, forty or a hundred years ago: language skill and an understanding of the law and legal procedure. And a good, sound, well-rested mind.


Further references

Speech recognition 

Dragon NaturallySpeaking: https://www.nuance.com/dragon.html
Tiago Neto on applications: https://tiagoneto.com/tag/speech-recognition
Translation Tribulations – free mobile for many languages: http://www.translationtribulations.com/2015/04/free-good-quality-speech-recognition.html
Circuit Magazine - The Speech Recognition Revolution: http://www.circuitmagazine.org/chroniques-128/des-techniques
The Chronicle - Speech Recognition to Go: http://www.atanet.org/chronicle-online/highlights/speech-recognition-to-go/
The Chronicle - Speech Recognition Is in Your Back Pocket (or Wherever You Keep Your Mobile Phone): http://www.atanet.org/chronicle-online/none/speech-recognition-is-in-your-back-pocket-or-wherever-you-keep-your-mobile-phone/

Document indexing, search tools and techniques

Archivarius 3000: http://www.likasoft.com/document-search/
Copernic Desktop Search: https://www.copernic.com/en/products/desktop-search/
AntConc concordance: http://www.laurenceanthony.net/software/antconc/
Multiple, separate concordances with memoQ: http://www.translationtribulations.com/2014/01/multiple-separate-concordances-with.html
memoQ TM Search Tool: http://www.translationtribulations.com/2014/01/the-memoq-tm-search-tool.html
memoQ web search for images: http://www.translationtribulations.com/2016/12/getting-picture-with-automated-web.html
Upgrading translation memories for document context: http://www.translationtribulations.com/2015/08/upgrading-translation-memories-for.html
Free shareable, searchable glossaries with Google Sheets: http://www.translationtribulations.com/2016/12/free-shareable-searchable-glossaries.html

Auto-translation rules for formatted text (dates, citations, etc.)

Translation Tribulations, various articles on specifications, dealing with abbreviations & more:
Marek Pawelec, regular expressions in memoQ: http://wasaty.pl/blog/2012/05/17/regular-expressions-in-memoq/

Authoring original texts in CAT tools

Translation Tribulations: http://www.translationtribulations.com/2015/02/cat-tools-re-imagined-approach-to.html

Autocorrection for typing in memoQ

Translation Tribulations: http://www.translationtribulations.com/2014/01/memoq-autocorrect-update-ms-word-export.html


  1. Nice clean uncluttered slide graphics

    1. Thanks, Kirti - the PPTX template is borrowed from Kilgray, who kindly provided it for a talk at memoQfest years ago. As for the content, I despise slides where you can read the whole text of the talk from the projected image - assuming you can squint hard enough to see all the often too tiny text. I usually use just a few images or words as reminders, but that does have the disadvantage that if I want to hand out the slides to students or others later I have to write extensive notes for them to make sense to the reader.

  2. The tips mentioned here for legal translation is very useful. Being a professional tarnslator it has been always difficult for me to carry out legal document translation. The above mentioned methods and tools can be beneficial for me.

  3. Thank you for those information. I also found those type of information free auto repair manuals pdf of Porsche from some other online websites.


Notice to spammers: your locations are being traced and fed to the recreational target list for my new line of chemical weapon drones :-)