Translation Tribulations: ODT

Showing posts with label ODT. Show all posts

Aug 6, 2013

Translating presentations in memoQ: PowerPoint vs. OpenOffice Impress

Microsoft PowerPoint files can be a real nuisance to translate. One of the biggest challenges with these files is the haphazard formatting that many authors apply when working in that medium: line breaks and paragraph breaks in the most inconvenient places, which can cause some stress when working with many translation environment tools.

The current status of the PowerPoint filters in memoQ (version 6.5 build 10) is not as well developed as the filters for Microsoft Word and Excel files; in particular the inability to configure the handling of "soft breaks" (line feeds) causes me no little grief. However, I can at least join segments to get complete sentences where I want them. That's something you can't do in SDL Trados Studio, though that tool at least represent the breaks as inline tags. Sometimes I prepare my PowerPoint files in Trados Studio and then translate the SDLXLIFF file in memoQ if there are a lot of breaks in the sentences. But then I miss the preview.

Recently I had occasion to look at a presentation created with OpenOffice Impress, a rather nice alternative to PowerPoint. Given the confusion over Microsoft's new licensing practices for MS Office 2013, I would not be surprised if more of my corporate clients begin to use the clever free alternative.

However, when I tried to import the Impress (ODP) files to memoQ, I found that the files were not recognized as a translatable format. However, that problem was quickly solved, and the technique for translating ODP files in the current and older versions of memoQ is shown in the video below. One could, of course, convert these to PowerPoint formats, but you might not want to. With ODP files, it is possible to have breaks treated as inline tags.

Time Description
0:33 Importing the PowerPoint file to memoQ with options
1:10 Examining the segments of the imported PowerPoint file
1:35 Joining segments for "broken sentences" in the imported PowerPoint file
1:43 The presentation as an OpenOffice Impress (ODP) file
2:07 Importing the ODP file to memoQ
2:39 Setting the filter for the "unknown" file type
3:04 Configuring "soft" breaks as inline tags
3:34 Examining the segments of the imported ODP file

I hope to see a few more refinements of the PowerPoint and OpenOffice filters in future builds of memoQ!

Jan 2, 2012

ODT files in translation environment tools

After an interesting afternoon with a friend who was a bit frustrated with the behavior of her translation assistance technology with an ODT (Open Office text) source file, I decided to have a look at how a variety of common tools handle this format. I created a small test file which contained some of the troublesome elements and saved it as *.odt for testing. The test file looked like this:

The ordered list was created using the numbering feature.

When the file was imported to OmegaT, the segmentation looked as follows:

Fairly clean, though the segmentation is a bit off due to the encoding of the space after the end of the sentence in the second block of text. Nine segments where there should have been ten.

With memoQ, the result was:

Altogether there were a dozen segments after import. The part with the hyperlink was segmented incorrectly in three parts instead of one. However, memoQ did handle the space tag after "tool." correctly and start a new segment at "Here". Once can, of course, use the segment joining function

to correct the segmentation until Kilgray gets around to fixing the segmentation on the hyperlink tag:

Update 9 January 2012: The developers at Kilgray have informed me now that this quirk in the ODT filter has been corrected and will be included in the next build released.

When I tried to test my SDL Trados Studio 2009 license, at first it refused to joint the party:

Never a dull moment with SDL as we all know. Of course SDL Trados 2007 was in fact installed, but when I upgraded to Studio 2009, of course it trashed my 2007 installation, and I had been too irritated to do anything about it for over half a year since I don't use Trados for anything more than file preparation and compatibility testing anymore, and I was still able to do that for my projects with the damaged installation. However, when I discovered that the ODT file caused TagEditor to run and hide without even saying goodbye, I sighed deeply and wasted half an hour reinstalling SDL Trados 2007. At least I didn't have to go through that insane check-in/check-out license procedure online. I trusted in God and my Windows Registry entries, and the location of my license file was remembered, so all was well.

The second attempt at SDL Trados Studio 2009 was much better:

Same segmentation problem as OmegaT, and examining the tags reveals where the issue might be addressed in a tweak of the filter.

I haven't got the latest upgrade, but someone was kind enough to run my test file through SDL Trados Studio 2011, which appears to offer the best results for filtering ODT (the settings were slightly different, with the URL included, but that is also possible with some other tools):

SDL Trados TagEditor also worked after re-installation. The results were:

Oh dear. Well, it works, but if I still used TagEditor, I would run, not walk, to the much cleaner interface of OmegaT for this sort of thing if I didn't have the good sense to upgrade to Studio or something else commercial. Note the same segmentation issue and the need for filter modification.

Victor Dewsbery was kind enough to import my test file to the original Atril DVX and the newer DVX2 and send me the results:

DVX import of the test file

DVX2 import of the test file.

I also tried to test SDLX, Wordfast Pro and Wordfast Anywhere. The first two tools don't support ODT. Wordfast Anywhere claims too, but went nowhere, with the following status message displayed in my browser for about half an hour before I gave up and went to lunch:

Of course I canceled. I had a blog post to write and a New Year to get on with. Anyone who wants to try the test file in another tool (to compare apples with apples) can get it here.

Dec 26, 2011

OmegaT: Best practice for translating content from memoQ

OmegaT is popular in some circles because it is Java-based and thus cross-platform, and it is free. Although rather limited in many respects compared with full-featured commercial tools such as SDL Trados Studio or memoQ, this Open Source tool can handle quite a number of formats well, offers interoperability pathways with the leading commercial tools and there are a good number of excellent professional translators who are satisfied with its features. Thus outsourcers using memoQ should understand the best procedures to follow if working with translators using OmegaT in order to avoid difficulties.

In the past, I have recommended using the bilingual XLIFF exports from memoQ for compatibility with memoQ. In theory, it's a nice approach, but I am encountering difficulties with memoQ-generated XLIFF files (possibly a Kilgray problem or a problem specific to my installation, not one having to do with OmegaT, which handled XLIFF from other sources properly in my tests). So for now I would say that a workflow involving memoQ's bilingual RTF tables is the best approach. Do the following to prepare the content for the translator:

Create a bilingual RTF table export from memoQ of the content to be translated. Use the "mqInternal" option for tags in order to change their color and facilitate proofreading of the final result.
Copy the source content cells into an empty DOCX or ODT file. OmegaT cannot read RTF and requires one of these two formats to be used in this case. The translator will be able to read these directly and translate.
Other resources such as TMs and glossaries:

OmegaT uses TMX for its translation memory. If you have a TM, provide it to the translator in this format.

The OmegaT glossary format is:
source term target term additional information
Provide terminology to the translator in this format if possible.
OmegaT is also capable of reading TBX, the industry-standard for glossary files.

The table cell content from the prepared file will look something like this in OmegaT:

Note that the memoQ tags are surrounded by additional OmegaT tags. Since OmegaT does not actually protect tags in its working environment, it is important that the translator verify the tags and proofread carefully, checking that all tags are present and applied correctly.

Once the translation is ready in the target DOCX or ODT file, open it in Microsoft Word, copy the translated table cells and paste into the target column of the bilingual RTF file, add any comments necessary to the Comments column of the table (if present). After the bilingual RTF is re-imported to memoQ, run a QA check to verify the tags again. After that the work can be proofread for content in memoQ or a bilingual export of an appropriate kind and the target file generated and delivered afterward.

Search me!