Jan 4, 2019

Translating Microsoft Publisher files

Every few months or so I run across a question in social media or am confronted with a project like this:



Some time ago, Paul Filkin published an interesting discussion of an Open Exchange application that enables SDL Trados Studio users to deal with the Microsoft Publisher format with some limitations; in the article, he also discussed other approaches, including one I have known about for some time: the use of Western Standard's Fluency

I looked at Fluency some years ago, and while I found some interesting things there, such as its transcription module, on the whole the application never seemed ready for prime time with its sloppy programming of details. I spent some time trying to persuade its underfunded team to correct some of the problems I saw, but after a while it became clear that the company and its product were not able to cope with the demanding technical challenges routinely faced by language service providers today.

The discussion which followed the posted question suggested a number of approaches, but if the colleague's client expected to receive a translated PUB file instead of some other format, the only realistic option for this possibly one-off job would be to use Fluency in some way. I assumed (and suggested) that a workflow involving
  *.pub <-> Fluency <-> (exchange format) <-> memoQ
might do the trick (with the exchange format probably being XLIFF, but otherwise the bilingual RTF format that I remembered from my tests of Fluency long ago.)

And so it proved to be. But the Devil is in the details.

The first sign of trouble came from a colleague - a professor at a local university who is known for his technical curiosity and flexibility in translation courses - who told me that Fluency does indeed offer an XLIFF export but that memoQ experienced problems importing it. His description of the error message sounded a lot to me like the typical mistakes that CAT tool programmers who are XLIFF newbies make when implementing a spec that they are probably too lazy to read and test. (I found the same error myself and submitted it to memoQ Support for comment a few hours ago.) He said that he had then tried the RTF export, but it wasn't clear to me what the result was and he was under time pressure, so I didn't press the matter but resolved to have a look myself.

I used a modified English template file for an invitation as my PUB file to test. The file imported easily into Fluency:

I assume that "terminology" download is some silly, unhelpful public domain dictionary I would never use.

The Fluency user interface offered a sort of WYSIWYG representation for the text, which makes it appear not bad for work, though appearances are deceiving. In fact, this proved to be a source of some trouble later.

As mentioned, the XLIFF export could not be used in memoQ, and although I am capable enough of analyzing structure problems in a tagged file, I wasn't in the mood to clean up someone else's mess, so I exported a "Fluency Work File" as my next attempt. That is app jargon for a bilingual RTF file similar to that found in other applications.


The difference with Fluency RTFs is that they include the WYSIWYG text representation. Nice, really, and this makes the work in another environment a little easier. I copied the source text column and pasted it into a new file (DOCX), then imported that to memoQ for translation:


Afterward, the translation exported from memoQ was pasted into the target column of the Fluency Work File (bilingual RTF exchange file). I imported that bilingual file back into Fluency and then exported a translated PUB file using the File / Save As command. I got a strange error message saying that there had been some trouble with the export and that some manual adjustment might be needed in Microsoft publisher.


At first glance I thought, "Looks OK" and then... WTF???  Everything was OK except the title. Not only was the text cut off, it was not even the text I had translated in German. When I copied the text out of the field and pasted it into Notepad, this is what I saw:
Tag der Tag der kulturellen Vielfalt
kulturellen Vielfalt
Vielfalt
kulturellen Vielfalt
kulturellen Vielfalt
Vielfalt
kulturellen Vielfalt
kulturellen Vielfalt
Vielfalt
No joke. Fluency somehow went berserk exporting the text of the title field, and sliced, diced and multiplied the whole mess in a truly bizarre way.

In my nearly 5 decades of casual and occasionally professional programming I have seen almost every stupidity imaginable, so in this case I imagined that somehow the problem lay in sloppy programming associated with text that is longer than the space provided in the field. Interestingly, Fluency enabled me to change the size of the target text in the translation window, so I reduced it by about half and tried to export a new target PUB file.


That worked in fact. So Fluency can indeed be used as a sort of filter for Microsoft Publisher files to be translated in other tools such as memoQ, but the process is not without trouble on the Fluency side, at least when text overruns the field size available, as one might expect to happen with some frequency.

Western Standard offers a 15-day trial of Fluency Now, their desktop tool for freelance translators, and the application can be paid on a monthly subscription of only 15 US dollars. So perhaps for the occasional project or client that requires work with PUB files that is an option. Microsoft Publisher is not taken seriously as a layout and publishing tool by graphics professionals and CAT tool providers, but because it is part of the Microsoft Office suite, one will find it in use from time to time, and this imperfect solution may be the best option for helping such clients.

2 comments:

  1. Oh dear, I've been browsing, reading tons of webpages and yet I'm not able to get this ton of .pub files translated with traditional tools and formats like xml or xliff.
    Will we have a fix finally for the new year 2020?

    ReplyDelete
    Replies
    1. Considering that professional graphic designers generally avoid using Microsoft Publisher, the best solution for the long-term is to migrate the content to a different format. I would be very surprised if SDL or memoQ or any other major tool makes a serious effort to deal with the *.pub format, because they simply do not see the market for it. The situation with Fluency was perhaps a little bit different; I am told that the company has perpetual problems with its finances, so perhaps the needs of some particular corporate client willing to spend money on a solution might be given more weight. Perhaps they could feed themselves a bit better by doing some filter development for "competitors".

      Delete

Notice to spammers: your locations are being traced and fed to the recreational target list for my new line of chemical weapon drones :-)