Translation Tribulations: TagEditor

Showing posts with label TagEditor. Show all posts

Jul 6, 2012

The new Trados TTX preview in memoQ 6

When I heard some weeks ago that memoQ 6 would offer a preview of TTX files, I was curious what this would be and how it was possible. In my mind, I had a picture of the view of the original format, and I couldn't see how this could be reproduced from a Trados TTX. Of course it can't.

I made a little file with two sentences, one footnote and one comment in Microsoft Word.

Then I made a TTX from that using SDL Trados 2007 and imported it into memoQ:

The memoQ tags, can of course be displayed in a shorter form using toolbar options; I'm just showing the full tag text here for demonstration purposes. The preview of the TTX is shown below the translation grid. It is of course a preview of the tags one would see in the Trados TagEditor.

At first I thought this was rather useless, but then I realized it had value after all:

The original tags are easier to understand than the tags in the memoQ grid, and the color differentiation between inline tags (green) and other tags (grey) is helpful.
Unsegmented content is shown, giving you an indication of important content that may require attention like numbers or numeric dates. (I tested this with a different file.)

In the case of that second point, I would re-import the document and select the option to include the unsegmented content. Then use the X-translate function available with version management to pick up where you left off without losing much time. I always leave versioning active in my memoQ projects to allow for possibilities like this. It is also helpful that the new version 6 offers all the features of versioning previously available only in the Project Manager edition of memoQ, which makes comparisons and reporting easier.

Jun 16, 2012

memoQuickie: footnote, cross-reference & index entry segmentation in Microsoft Word files

If you have a Microsoft Word DOC file or RTF to translate, it is important to be aware of the different behaviors of the memoQ import filter options you can use. If there are footnotes, cross-references or index entries, it is far better to use the option to import the DOC or RTF file as DOCX.

The DOC file shown below has a footnote, a cross-reference and an index entry:

Adding it to a memoQ project with the default filter for Microsoft Word in memoQ 5

gives the following segmentation result:

Importing the same document with the DOCX option of the filter

yields much cleaner segmentation and better tags to work with:

Compare what some other programs do with this file:

WordFast Pro

DVX2 (DOC)

DVX2 (DOCX)

TagEditor salad (partial)

SDL Trados Studio 2009 segmentation

SDL Trados Studio 2011

There is room for improvement with most tools.

Jan 2, 2012

ODT files in translation environment tools

After an interesting afternoon with a friend who was a bit frustrated with the behavior of her translation assistance technology with an ODT (Open Office text) source file, I decided to have a look at how a variety of common tools handle this format. I created a small test file which contained some of the troublesome elements and saved it as *.odt for testing. The test file looked like this:

The ordered list was created using the numbering feature.

When the file was imported to OmegaT, the segmentation looked as follows:

Fairly clean, though the segmentation is a bit off due to the encoding of the space after the end of the sentence in the second block of text. Nine segments where there should have been ten.

With memoQ, the result was:

Altogether there were a dozen segments after import. The part with the hyperlink was segmented incorrectly in three parts instead of one. However, memoQ did handle the space tag after "tool." correctly and start a new segment at "Here". Once can, of course, use the segment joining function

to correct the segmentation until Kilgray gets around to fixing the segmentation on the hyperlink tag:

Update 9 January 2012: The developers at Kilgray have informed me now that this quirk in the ODT filter has been corrected and will be included in the next build released.

When I tried to test my SDL Trados Studio 2009 license, at first it refused to joint the party:

Never a dull moment with SDL as we all know. Of course SDL Trados 2007 was in fact installed, but when I upgraded to Studio 2009, of course it trashed my 2007 installation, and I had been too irritated to do anything about it for over half a year since I don't use Trados for anything more than file preparation and compatibility testing anymore, and I was still able to do that for my projects with the damaged installation. However, when I discovered that the ODT file caused TagEditor to run and hide without even saying goodbye, I sighed deeply and wasted half an hour reinstalling SDL Trados 2007. At least I didn't have to go through that insane check-in/check-out license procedure online. I trusted in God and my Windows Registry entries, and the location of my license file was remembered, so all was well.

The second attempt at SDL Trados Studio 2009 was much better:

Same segmentation problem as OmegaT, and examining the tags reveals where the issue might be addressed in a tweak of the filter.

I haven't got the latest upgrade, but someone was kind enough to run my test file through SDL Trados Studio 2011, which appears to offer the best results for filtering ODT (the settings were slightly different, with the URL included, but that is also possible with some other tools):

SDL Trados TagEditor also worked after re-installation. The results were:

Oh dear. Well, it works, but if I still used TagEditor, I would run, not walk, to the much cleaner interface of OmegaT for this sort of thing if I didn't have the good sense to upgrade to Studio or something else commercial. Note the same segmentation issue and the need for filter modification.

Victor Dewsbery was kind enough to import my test file to the original Atril DVX and the newer DVX2 and send me the results:

DVX import of the test file

DVX2 import of the test file.

I also tried to test SDLX, Wordfast Pro and Wordfast Anywhere. The first two tools don't support ODT. Wordfast Anywhere claims too, but went nowhere, with the following status message displayed in my browser for about half an hour before I gave up and went to lunch:

Of course I canceled. I had a blog post to write and a New Year to get on with. Anyone who wants to try the test file in another tool (to compare apples with apples) can get it here.

Dec 27, 2011

Clean up the tag mess with CodeZapper for all CAT tools

Readers of this blog probably know by now that I am a Dave Turner fan. His CodeZapper macros have probably saved me hundreds of hours of wasted time over the years (not an exaggeration), and I think there are a lot of other translators and project managers with similar experiences. It doesn't solve every problem with superfluous tags, but it solves a lot of them, and Mr. Turner works steadily at improving the tool. I blogged the release of the latest version not long ago; it is now available directly from him for a modest fee of 20 euros (see the link to the release announcement for a contact link). That means it pays for itself in far less than an hour of saved time.

Over the past few days I have been updating some training documentation and running a lot of tests on tagged files as part of this. During this work, I have been struck time and again by the differences in the tags "found" by different tools working with the same file. Sometimes one tool looks better than another, but the patterns are not always consistent. What is most consistent is the ability of CodeZapper to clean up the files in various versions of Microsoft Word and make the tag structures appear a little more uniform.

Here's an example of the same DOCX file "unzapped" in several tools:

Import into memoQ 5, as-is, no tag clean-up. Previous versions of the same file showed more tags in places.

SDL Trados Studio 2009 before tag clean-up.

TagEditor in SDL Trados 2007 before tag clean-up

Initially, OmegaT would not import that particular DOCX without a tag cleanup. I reported the problem to the developers, who upgraded the filter to handle a previously unfamiliar character in internal paths of the ZIP file (DOCX is actually just a renamed ZIP package like many other file types). See http://tech.groups.yahoo.com/group/OmegaT/message/23931 for information on the new release. Opening, editing and re-saving the troublesome file enabled it to be imported after all without the latest version bugfix. So users should keep that trick in mind perhaps if a similar problem is encountered. I've had to do similar actions in the past with other tools, so this is probably a good general tip to keep in mind regardless of what tool you use. When I downloaded an tested the latest standard release of OmegaT (2.3.0_4), the tag structure looked fine - no zapping of the DOCX was necessary in this case.

After treatment with CodeZapper, the file looked the same in memoQ (where the extra tags weren't present in the first place, though one can't count on things always being this way). The view in Trados Studio and TagEditor improved significantly, though there were still more tags, and OmegaT accepted the DOCX after tag cleaning.

SDL Trados Studio 2009 import of the DOCX file after tag cleanup with CodeZapper

SDL Trados 2007 TagEditor import of the DOCX file after tag cleanup with CodeZapper

OmegaT import of the DOCX file after tag cleanup with CodeZapper (OmegaT 2.3.0_3)

It is important to consider that superfluous tags mean wasted work time with formatting and QA corrections, perhaps even a higher risk of file failure (such as the inability to import the file at all into one tool). This is why for some time now, I and others have advocated modifying the costing of volume-based translation work to include the amount of tags. This requires, of course, that you have access to a counting tool which reports the number of tags (SDL Trados Studio does this - Atril's Déjà Vu has long offered this feature, and memoQ even allows you to assign a word or character "weight" for counting purposes). This is the only fair way I know of to account for the extra work (beside time-based charges). Consider that everyone is affected: translators, reviewers and project managers! I've had to talk more than one of the last group through "tag rescue" techniques after hours.

Perhaps it is worth considering as well that cleaner tagging will also improve "leverage" (match quality) in translation memories. So if a tool does offer cleaner tag structures (fora variety of source formats) consistently, working with that tool efficiently to manage projects will save time and money as well on top of the time and money saved with the use of CodeZapper macros in MS Word files.

Dec 25, 2011

SDLXLIFF files in TagEditor, OmegaT and memoQ

As SDL Trados Studio gains acceptance, SDL's own flavor of XLIFF is encountered with increasing frequency by translators using other tools. I decided to test three of these to see how they fared: TagEditor (for "backward compatibility" with Trados users who haven't upgraded), the Open Source tool OmegaT and memoQ.

A simple DOCX test file was created, which looked like this:

It was opened in SDL Trados Studio 2009 and saved as an SDLXLIFF file, which was subsequently imported into each of the other three translation environment tools.

TagEditor test
Using the default XLIFF INI supplied with SDL Trados 2007, I obtained results which looked as follows:

Some ugly tag salad there and exposed , vulnerable information from the header. Using the adapted INI file I made for memoQ XLF files, things improved a bit:

Still not very pretty, but it works, and it works better than an memoQ XLIFF currently does in TagEditor. No breaking of tags.

Translated and brought back into SDL Trados Studio, the translation grid looked like this with everything in good order:

The target DOCX file with the translation saved nicely and was perfect.

In real life, however, it may be necessary to adapt the INI file in TagEditor more extensively for good results. The German consultancy Loctimize has compiled some good instructions for doing so in which the entire workflow is also described nicely (in German). So far I haven't run across similar instructions in English.

OmegaT test
Initially things looked much better with the SDLXLIFF file imported to OmegaT:

A great start, much cleaner-looking than TagEditor! But when the translation was re-imported to SDl Trados Studio, a small problem was apparent:

One of the tags in the second segment was dropped. In a similar test with an XLIFF from memoQ, the version of OmegaT I tested (version 2.3.0, update 3) appeared to trash even more tags, and the target file was completely reformatted! In fact, it even trashed tags on the source side in the memoQ file! Thus I was deeply concerned about the XLIFF filter in OmegaT. However, as astute observers have noted, I probably deleted the missing tag when editing in OmegaT, and a subsequent successful re-test of the workflow confirmed this. But the problem with the XLF file from memoQ was frighteningly repeatable. Careful, systematic testing revealed, however, that the roundtrip of a bilingual XLF file from memoQ back into memoQ failed. Either there is a problem with the version I have installed (5.0.56) or the installation is corrupted. The matter is being pursued with Kilgray support. The target file from the SDLXLIFF translated with OmegaT was fine.

memoQ test
I have translated many SDLXLIFF files in memoQ and seldom encountered a problem of any kind. The file from SDL Trados Studio looks as follows in the memoQ environment:

Please note: with memoQ I can use an XLIFF which has not had the source copied to the target or one which has been pretranslated. That is not really the case for the other two environments tested, because with both TagEditor and OmegaT the source must be copied to the target or you have nothing to translate. You might say that memoQ offers "real" XLIFF editing for translation.

The SDLXLIFF file translated in memoQ reimported beautifully to SDL Trados Studio 2009 and saved to a target file (DOCX) from there with no problems.

Trados TagEditor: Optimal translation of memoQ bilinguals

With the growing number of translation agencies, direct clients and outsourcing translators adopting Kilgray's memoQ as a working platform for managing translation project content, it is particularly important for these new memoQ users and their partners to understand the best approaches to working together with persons who use other tools. One tool which is still commonly found is SDL Trados TagEditor. Compared to the other "classic" Trados tool, the Workbench macros for Microsoft Word, TagEditor has the advantage of enabling many different file formats to be processed while protecting their formatting elements (also known as "tags").

SDL Trados TagEditor can work with two types of "bilingual" files prepared in memoQ: XLIFF (*.xlf) files and bilingual RTF tables. Each approach will be presented here along with some suggestions for best practice.

XLIFF files
TagEditor comes with a default INI file for translating XLIFF, typically found at the path C:\ProgramData\SDL International\Filters\XLIFF.ini.This INI enables the contents of the target segments from the memoQ XLF file to be translated as the source in TagEditor. Thus for this approach to work, the source must be copied completely to the target in memoQ before the bilingual XLIFF is created using the Export bilingual function of the Translations page. This makes pretranslation undesirable in most cases, because the source text for matches will not be accessible and the translator will end up with a very screwy TM. Data for the TM should be supplied to the translator as TMX; be aware that match rates for the segments in TagEditor will differ significantly in some cases.

The memoQ XLIFF files will have a lot of "junk" at the top of the file when viewed in TagEditor:

Skip the content between the mqfilterinformation tags and do not change it in any way. Place the cursor below that to start working. If you prefer not to see that information at all, use the XLIFF INI for TagEditor which I modified for use with memoQ XLF files. Then the XLIFF will look a bit cleaner with the header information filtered out:

Astute observers may have noticed, however, that all is not really well with the tag structures in the views above. I think there is problem with the way that memoQ is generating the XLIFF files, with some tag structures being replaced by entities. (You see this if you open the XLIFF from memoQ in a text editor.) This causes consistent problems like the following in TagEditor:

This will require a lot of tag fixing. Thus I really can't recommend the XLIFF method at this point, not for my simple little test file in any case. The methods using the bilingual RTF tables with memoQ tag protection are safer and the structures that result are much simpler.

But if you do use this method, when the translation is complete, clean the TTX file using Trados Workbench or use the menu option File > Save Target As... in TagEditor to create an XLIFF file to return with the translated content. If the content inside the mqfilterinformation tags has not been segmented, an accurate count of the words translated will be shown in Trados Workbench upon cleaning the TTX (as accurate as that tool is given its limitations with numbers, dates, etc.)

Bilingual RTF tables
There are created in memoQ using the Two-column RTF option of the Export bilingual function. Technically speaking, the files have more than two columns (source and target, index numbers and possibly columns for a second target text, comments and status). Good practice for working with these files in TagEditor and many other tools also requires the source to be copied to the target column. This can be done in memoQ or later in a word processor. The table might look like this, for example:

For best results in TagEditor, it is important that this file be generated with the "mqInternal" style selected for tag formatting. The dark red color imparted to the tags with this option means that proofreading in a word processor is easier, and it also enables the text of the tags to be selected and hidden using a search and replace function. If the RTF file is then saved as a Microsoft Word file, the memoQ tags in the table will then be protected in TagEditor!

If the "full text" option for tags is selected, this makes little or no difference in the TagEditor view.

Here's a quick look at what the protected memoQ tags look like in TagEditor and what can happen without protection:

One possible workflow for memoQ RTF tables in SDL Trados TagEditor consists of the following steps:

Copy the source text to the target in memoQ
Export a bilingual "two-column" RTF file with the mqInternal style option selected for the tags
Re-save the RTF as a DOC or DOCX file! This is necessary so that TagEditor will use the right filter.
Select and hide all the text in the file
Select only the text to translate in the target column and unhide it
Using search and replace, hide all the dark red text. The settings for the dialog are show below and are set using the Font... option (marked with a red arrow in the screenshot) in the Format dropdown menu of the Replace dialog.

The font color to hide will be found under More Colors... in the font colors of the font properties dialog:

Launch TagEditor and open the Microsoft Word file with your content to translate. All the hidden text will be protected in tags. Translate the accessible text.
Create a target MS Word file from your TTX as described above for the XLIFF files translated in TagEditor.
Open the target file and unhide all the text.
(Optional) When reviewing the text in the word processor, comments may be added if there is a comments column. These will be imported back into memoQ and can serve as valuable feedback.
Re-save the target file as an RTF
Re-import the RTF with the translated table into memoQ. The target text will be updated to include the translation.
A QA check for tags, terminology, etc. should be performed in memoQ before exporting the final file for delivery. If an external reviewerr is used, another bilingual file in an appropriate format can be generated in memoQ for that work.

Steps 4 to 6 can be performed using a macro for convenience.

The procedure described above can, of course, be abbreviated considerably by simply copying the source text cells into a new Microsoft Word document, doing the search and replace to hide the dark red text for the tags, then processing the file in TagEditor. After translating, unhide the text in your working file, then paste the cells over the target cells in the RTF file.

Here's a look at the test file translated in TagEditor (with a comment added as shown by the dark speech balloon icon) after it was re-imported to memoQ:

And here's the translated file itself:

Dec 21, 2011

Presegmented "classic" Trados files

Given that many outsourcing translators, agencies and companies still use older versions of Trados but often want to work with qualified translators without tripping over tool issues, this is still a current topic despite the new SDL Trados tools having been on the market for several years. And my old published procedures on these matters are either no longer publicly available or are somewhat in need of updating.

Before I began blogging in 2008, I wrote a number of procedures to help my partner, colleagues and clients understand the best procedures for handling "Trados jobs" with other translation environment tools. When translating a TTX file with Déjà Vu, memoQ and many other applications, it is often best practice to "presegment" the file using a demo or licensed version of Trados 2007 or earlier. In fact, if this is done on the client's system, many little quirks of incompatibility that can be experienced if the translator used a different build of Trados (for example) can be avoided.

What does "presegment" actually mean? It is a particular method of pretranslation in which for segments where the translation memory offers no match, the source text is copied to the target segment. If performed with an empty TM, the target segments are initially identical to the source segments. If this procedure is followed, full, reliable compatibility is achieved between applications such as Déjà Vu and memoQ for clients using Trados versions predating Trados Studio 2009. For newer versions of Trados, the best procedure involves working with the SDLXLIFF files from Studio. If a freelance translator does not own a copy of SDL Trados 2007 or an earlier version used by an agency or direct client, this is the procedure to share with a request for presegmentation. While some clients might expect the translator to do such work using his or her own copy of Trados, I have experienced enough trouble with complex files over the years when different builds of the same version of Trados are used that I consider this to be the safest procedure to follow - safer even than having the translator do the work in Trados in many cases.

Step 1: Prepare the source files
Before creating a TTX file and presegmenting it for translation in DVX or creating a presegmented RTF, DOC or DOCX file compatible with the Trados Workbench or Wordfast Classic macros in Microsoft Word, it is a very good idea to take a look at the file and clean up any "garbage" such as optional hyphens, unwanted carriage returns or breaks, inappropriate tabbing in the middle of sentences, etc. Also, if the file has been produced by incompetent OCR processes, there may be a host of subtle font changes or spacing between letters, etc. that will create a horrible mess of tags when you try to work with most translation environment tools. Dave Turner's CodeZapper macros are a big help in such cases, and other techniques may include copying and pasting to and from WordPad or even converting to naked text in Notepad and reapplying any desired formatting. This will ensure that your work will not be burdened by superfluous tags and that the uncleaned file after the translation will have good quality segmentation.

Step 2: Segment the source files
If the source files are of types which Trados handles only via the TagEditor interface, then they may be pretranslated directly by Trados Workbench to produce presegmented TTX files. If they are RTF or Microsoft Word files, on the other hand, and a TTX file is desired, you must first launch TagEditor, open the files in that environment and then save them to create the TTX files, which are then subsequently pre-translated using Trados Workbench. If a presegmented RTF or Microsoft Word file is desired (for subsequent review using the word processor, for example), then the files can be processed directly with Trados Workbench.

Important Trados settings:

In Trados Workbench, select the menu option Options > Translation Memory Options… and make sure that the checkbox option Copy source on no match is marked.

In the dialog for the menu option Tools > Translate, mark the options to Segment unknown sentences and Update document.

After the settings for Trados Workbench are configured correctly, select the files you wish to translate in the dialog for the Workbench menu option Tools > Translate and pretranslate them by clicking the Translate button. This will create the "presegmented" files for import into DVX, memoQ, etc. If the job involves a lot of terminology in a MultiTerm database, which cannot be made available for the translation in the other environment (perhaps due to password protection or no suitable MultiTerm installation on the other computer), you might want to consider selecting the Workbench option to insert the terms.

Note: to get a full source-to target copy, use an empty Trados Workbench TM. However, if an original customer TM is used for this step you will often get better "leverage" (higher match rates) than if you work only with a TMX export of the TM to the other environment. If I am supplied with a TWB TM, I usually presegment with it first, then export it to TMX and bring it into memoQ or DVX for concordancing purposes. However, in some cases, such as with the use of memoQ's "TM-driven segmentation", you might get better matches in the other environment (not Trados).

The one performing the presegmentation might want to inspect the segmented files in TagEditor or MS Word to ensure that the segmentation does not require adjustment. Segments can typically be joined in other environments such as memoQ in order to have sensible TM entries in that environment or deal with structural issues in the language, but this will not avoid useless segments in the content for Trados. The best way to deal with that is by fixing segments there. Otherwise, I often provide a TMX export from memoQ to improve the quality of the Trados TM.

Step 3: Import the segmented source files into the other environment
The procedure for this varies depending on your translation environment tool. Usually the file type will be recognized and the appropriate filter offered. In some cases, the correct filter type must be specified (such as in memoQ, where a presegmented bilingual RTF/DOC must be imported using the "Add document as..." function and specifying "Bilingual DOC/RTF filter" instead of the default "Microsoft Word filter".

Some tools, like memoQ, offer the possibility of importing content which Trados ignores, such as numbers and dates, This is extremely useful when number and date formats differ between the languages involved. It saves tedious post-editing in Word or TagEditor and also enables a correct word count to be made.

A few words about output from the other (non-Trados) environment
If you import a TTX to Déjà Vu, memoQ, etc., what you will get when you export the result is a translated TTX file, which must then be cleaned using Trados under the usual conditions. Exporting a presegmented RTF or Microsoft Word file from DVX gives you the translated, presegmented file. The ordinary export from memoQ will clean that file and give you a deliverable target file. To get the bilingual format for review, etc. you will have to use the option to export a bilingual file.

Other environments such as memoQ or Déjà Vu may also offer useful features like the export of bilingual, commented tables for feedback. This saves time in communicating issues such as source file problems, terminology questions, etc. and is infinitely superior to the awful Excel feedback sheets that some translation agencies try to impose on their partners.

Editing translations performed with Trados
A translation performed using the Trados Workbench macros in Microsoft Word or using TagEditor can be easily reviewed in many other environments such as Déjà Vu or memoQ. In fact, I find that the QA tools and general working environment with this approach is far superior to working in TagEditor or Word, for example. Tag checks can be performed easily, compliance with standard terminology can be verified, content can be filtered for more efficient updates and more.

Editing translations performed with more recent versions of Trados (SDL Trados Studio 2009 and 2011) is also straightforward, as these SDLXLIFF files are XLIFF files which can be reviewed in any XLIFF-compatible tool.

Search me!