Pages

Jan 13, 2014

Locking out other languages in memoQ source texts

One of the interesting and useful results of Kilgray introducing document language recognition features in memoQ 2013 R2 is the ability to identify and exclude segments in other languages. I see this sort of thing from time to time in German patent dispute documents which quote English patent texts extensively or in texts to translate where new source language material may have been added to an existing translation. In the past, I prepared such texts for translation by hiding the text which is already in the target language or is in a language I cannot translate (such as French) or I locked it manually, which can be time-consuming to do in a long text. Now the task of preparing such tasks for translation is a little easier.


The screenshot above shows a patchwork document with German and English. The hundreds of segments in this job were a wild mix of the two languages with unfortunately few coherent blocks of the source language (German). To save time in preparation, I selected the option in the Operations menu to lock the segments:



The result of the locking procedure looked like this:


Most of the English segments were copied source to target and locked. The differentiation of languages is performed using statistics and is rather good but not perfect. In slightly under 400 segments, there were 5 or 6 that were not correctly identified and locked. Several of these were in the bibliography and consisted of  long string of names and one or two short English words or abbreviations. I saw no false positives (source language misidentified and locked), though I did hear a report of some from another translator working from Dutch to English with a very large mixed document. Discussions with Kilgray Support revealed that a "failure rate" of about 1-2% may be experienced for this feature.

So what good is it? A lot, really. It enabled me to do a quick estimate of effort and separate the two languages so I could make a reasonable assessment of the separate efforts for proofreading the English and translating the German. Obviously, if I were a project manager preparing  file for somebody else to translate, I would need to do manual checking of the segments to correct any errors of identification. But this feature would still often save me a great deal of time in preparing the file, an manual checking is important to do anyway to ensure that there are no segmentation problems which may cause difficulties in translation.

Do you work with mixed language documents where this feature might be relevant? If you do, have you tried this yet? What has your experience been with your language pair(s)?

2 comments:

  1. It is perhaps worth pointing out that although this is a version 6.8 (memoQ 2013 R2) feature, it can be used to prepare translations for earlier versions of memoQ. I have done limited tests going back to version 6.0 using MQXLZ bilingual imports, and translations can be prepared for earlier versions of memoQ using XLIFF (MQXLIFF with the extensions renamed to XLF) or bilingual RTF files. If there are any concerns about the stability of the process, it can be validated by a "round trip test" in which files imported to the older version of memoQ are pseudotranslated or simply copied source to target, then are re-exported and returned to the original system. With the MQXLZ files, minor version increments which occur on the other system (from target file exports, snapshots or bilingual exports) will be retained when the translation file is returned.

    ReplyDelete
  2. I don't use MemoQ regularly, but this seems like a very nice feature.
    Luckily for me I'm working in a language that uses a different alphabet than the Latin one, so usually I'm about to run a simple Regex filter to separate the two. Things get a little more complicated when a single segment contains both English and Hebrew (in my case) letters, and I'm wondering how MemoQ would handle it.

    Overall, a very nice feature that can help save some time and effort

    ReplyDelete

Notice to spammers: your locations are being traced and fed to the recreational target list for my new line of chemical weapon drones :-)