Dec 21, 2011

Presegmented "classic" Trados files

Given that many outsourcing translators, agencies and companies still use older versions of Trados but often want to work with qualified translators without tripping over tool issues, this is still a current topic despite the new SDL Trados tools having been on the market for several years. And my old published procedures on these matters are either no longer publicly available or are somewhat in need of updating.

Before I began blogging in 2008, I wrote a number of procedures to help my partner, colleagues and clients understand the best procedures for handling "Trados jobs" with other translation environment tools. When translating a TTX file with Déjà Vu, memoQ and many other applications, it is often best practice to "presegment" the file using a demo or licensed version of Trados 2007 or earlier. In fact, if this is done on the client's system, many little quirks of incompatibility that can be experienced if the translator used a different build of Trados (for example) can be avoided.

What does "presegment" actually mean? It is a particular method of pretranslation in which for segments where the translation memory offers no match, the source text is copied to the target segment. If performed with an empty TM, the target segments are initially identical to the source segments. If this procedure is followed, full, reliable compatibility is achieved between applications such as Déjà Vu and memoQ for clients using Trados versions predating Trados Studio 2009. For newer versions of Trados, the best procedure involves working with the SDLXLIFF files from Studio. If a freelance translator does not own a copy of SDL Trados 2007 or an earlier version used by an agency or direct client, this is the procedure to share with a request for presegmentation. While some clients might expect the translator to do such work using his or her own copy of Trados, I have experienced enough trouble with complex files over the years when different builds of the same version of Trados are used that I consider this to be the safest procedure to follow - safer even than having the translator do the work in Trados in many cases.

Step 1: Prepare the source files
Before creating a TTX file and presegmenting it for translation in DVX or creating a presegmented RTF, DOC or DOCX file compatible with the Trados Workbench or Wordfast Classic macros in Microsoft Word, it is a very good idea to take a look at the file and clean up any "garbage" such as optional hyphens, unwanted carriage returns or breaks, inappropriate tabbing in the middle of sentences, etc. Also, if the file has been produced by incompetent OCR processes, there may be a host of subtle font changes or spacing between letters, etc. that will create a horrible mess of tags when you try to work with most translation environment tools. Dave Turner's CodeZapper macros are a big help in such cases, and other techniques may include copying and pasting to and from WordPad or even converting to naked text in Notepad and reapplying any desired formatting. This will ensure that your work will not be burdened by superfluous tags and that the uncleaned file after the translation will have good quality segmentation.

Step 2: Segment the source files
If the source files are of types which Trados handles only via the TagEditor interface, then they may be pretranslated directly by Trados Workbench to produce presegmented TTX files. If they are RTF or Microsoft Word files, on the other hand, and a TTX file is desired, you must first launch TagEditor, open the files in that environment and then save them to create the TTX files, which are then subsequently pre-translated using Trados Workbench. If a presegmented RTF or Microsoft Word file is desired (for subsequent review using the word processor, for example), then the files can be processed directly with Trados Workbench.

Important Trados settings:
  • In Trados Workbench, select the menu option Options > Translation Memory Options… and make sure that the checkbox option Copy source on no match is marked. 

  • In the dialog for the menu option Tools > Translate, mark the options to Segment unknown sentences and Update document.

After the settings for Trados Workbench are configured correctly, select the files you wish to translate in the dialog for the Workbench menu option Tools > Translate and pretranslate them by clicking the Translate button. This will create the "presegmented" files for import into DVX, memoQ, etc. If the job involves a lot of terminology in a MultiTerm database, which cannot be made available for the translation in the other environment (perhaps due to password protection or no suitable MultiTerm installation on the other computer), you might want to consider selecting the Workbench option to insert the terms.

Note: to get a full source-to target copy, use an empty Trados Workbench TM. However, if an original customer TM is used for this step you will often get better "leverage" (higher match rates) than if you work only with a TMX export of the TM to the other environment. If I am supplied with a TWB TM, I usually presegment with it first, then export it to TMX and bring it into memoQ or DVX for concordancing purposes. However, in some cases, such as with the use of memoQ's "TM-driven segmentation", you might get better matches in the other environment (not Trados).

The one performing the presegmentation might want to inspect the segmented files in TagEditor or MS Word to ensure that the segmentation does not require adjustment. Segments can typically be joined in other environments such as memoQ in order to have sensible TM entries in that environment or deal with structural issues in the language, but this will not avoid useless segments in the content for Trados. The best way to deal with that is by fixing segments there. Otherwise, I often provide a TMX export from memoQ to improve the quality of the Trados TM.

Step 3: Import the segmented source files into the other environment
The procedure for this varies depending on your translation environment tool. Usually the file type will be recognized and the appropriate filter offered. In some cases, the correct filter type must be specified (such as in memoQ, where a presegmented bilingual RTF/DOC must be imported using the "Add document as..." function and specifying "Bilingual DOC/RTF filter" instead of the default "Microsoft Word filter".

Some tools, like memoQ, offer the possibility of importing content which Trados ignores, such as numbers and dates, This is extremely useful when number and date formats differ between the languages involved. It saves tedious post-editing in Word or TagEditor and also enables a correct word count to be made.

A few words about output from the other (non-Trados) environment
If you import a TTX to Déjà Vu, memoQ, etc., what you will get when you export the result is a translated TTX file, which must then be cleaned using Trados under the usual conditions. Exporting a presegmented RTF or Microsoft Word file from DVX gives you the translated, presegmented file. The ordinary export from memoQ will clean that file and give you a deliverable target file. To get the bilingual format for review, etc. you will have to use the option to export a bilingual file.

Other environments such as memoQ or Déjà Vu may also offer useful features like the export of bilingual, commented tables for feedback. This saves time in communicating issues such as source file problems, terminology questions, etc. and is infinitely superior to the awful Excel feedback sheets that some translation agencies try to impose on their partners.

Editing translations performed with Trados
A translation performed using the Trados Workbench macros in Microsoft Word or using TagEditor can be easily reviewed in many other environments such as Déjà Vu or memoQ. In fact, I find that the QA tools and general working environment with this approach is far superior to working in TagEditor or Word, for example. Tag checks can be performed easily, compliance with standard terminology can be verified, content can be filtered for more efficient updates and more.

Editing translations performed with more recent versions of Trados (SDL Trados Studio 2009 and 2011) is also straightforward, as these SDLXLIFF files are XLIFF files which can be reviewed in any XLIFF-compatible tool.


  1. Super! This will be printed and stuck on my cork board until I can do this in my sleep (ideally by tomorrow!)

    Thanks, Kevin!

  2. Great posting!
    One caveat though: when translating a bilingual Word that contains variables in MemoQ, be sure to check the output when you export as the variables might be broken (happened to me).
    Also a note regarding DejaVu X2: when you import a bilingual Word file, you usually have to change the filter to "Workbench" as DVX2 will assume it is a Word doc.
    Last but not least: I have not used the "copy source to target" option so far, and have not run into any issue (as long as "segment unknown sentences" is checked). Have you?

  3. Hi Kevin,

    Just wondering how you handle splitting and/or joining segments when you are forced to deliver 'uncleaned' Word files to the client. I did all of the pre-segmenting stuff etc. and imported them into memoQ, and am now puzzling over how to approach this. I was considering asking my client, but I suspect that would be a waste of my time.


  4. @Michael: What's to puzzle over here? I don't give such things a bit of thought any more. I split and join as I like in memoQ so that I have a high quality TM entry, and the client gets whatever comes out in the process. I used to think that it was important to correct the segmentation until (1) I found that some clients actually got irritated by this because it messes up their alleged "leveraging" and (2) quite a number of mindless Trados-using drones with TWB or TagEditor don't know how to correct segmentation anyway and will inevitably deliver crap to clean into the TM.

    If you want to be a nice guy, offer a TMX export as a little extra with optimized segmentation. You might want to do a search & replace to delete the "squiggly tags" if you want to be extra nice.


Notice to spammers: your locations are being traced and fed to the recreational target list for my new line of chemical weapon drones :-)