Jun 14, 2018

Translating Wordfast GLP packages... elsewhere.


One reason to keep  translation environment tool licenses up to date is that new formats continue to appear. New formats for translatable files as well as new file formats for the tools that help to process files for translation. Very often I have heard some "professional" say "I'm a translator, not a [fill in the blank]. If the client wants this translated, I'll have to get it in a Microsoft Word file." Or something like that.

Let's get real for a moment.

  • That attitude is simply lazy and disrespectful toward translation consumers who would like to make use of one's services and
  • a lot of money is being left on the table here in many cases. I built a huge clientele at the start of the last decade, because my use of translation environment tools like Trados, Déja Vu, STAR Transit and Wordfast enabled me as an individual to tackle translation challenges that many agencies at the time had no concept of how to cope with.
As translation agencies have acquired more technical tools, most of them still remain unfortunately unaware of how to use them properly or plan more than the simplest workflows well, but that's a subject for another day. Also...
  • ... by using tools and techniques that are compatible with what your clients require for a final format, you can save your client a lot of time and money for further layout work - and probably avoid the introduction of errors in your translation work in its final format as well.
  • And in my experience, showing technical and process competence to benefit clients usually leads to greater trust and better work together.
So what has all this got to do with Wordfast?

Well... I didn't like the Wordfast brand for a very long time. Its various incarnations were perhaps the weakest of the popular tools in a technical sense, and inevitably when agency friends called me, desperate to fix some massive translator screw-up (usually by somebody in France), Wordfast "Pro" was often involved in the disaster.

I looked at the "newer" Wordfast versions a number of times over the years, and honestly they always seemed like lobotomized wannabe tools. This was about the time that many other toolmakers were trying to decide if they should support XLIFF.

Well, a lot has changed since then. I became aware of the changes the other day when somebody posted a question in a social media forum for memoQ asking how to handle Wordfast Pro 5 GLP packages. I had never heard of these, so of course I was curious and decided to take a look. This finally led me to download a 30-day trial of the latest Wordfast Pro software to evaluate its potential for interoperable work with other translation environments. I see a lot of changes since my last look, and so far I think they are all positive, and along the way I had good cause to look at Wordfast Anywhere, the free web-based CAT tool that I talked some university colleagues into not wasting their time with a while ago. Well, my recommendation in that regard might change, but that and commentary on the latest incarnation of WF Pro will have to wait for another day.

About those GLP packages....


Yes, those. This was the question:


Someone pointed out that GLP files - like every other translation "package" one finds from all the tool providers - are merely ZIP files with particular structure inside and the extension re-named. 


Gotta love Facebook. You'll always get an answer in some group, usually a wrong one. That's why I keep a blog. Good information gets buried in social media noise too often, and good luck finding it in any kind of search. In this case... we don' have no steenkeen TXML files as I learned... that's the old Wordfast Pro....

A colleague in Germany kindly provided me with a little GLP package to examine, which I promptly unzipped. I noticed that at least one tool (7-Zip) sees through the renamed extension nonsense and saved me the usual trouble of renaming it before unpacking.


So far, so good... inside the folder for the unpacked GLP file I found the following:


The test package was an English to Portuguese project. But source? Hello? Let's have a look there!


Very interesting. The original source files (English) came along for the ride. This is good, because I often like to translate source files in memoQ - taking advantage of the preview there for many file types - and then use the translation memory to translate the file that is created by other other tool (usually SDL Trados SDLXLIFF files in my work). Now let's have a look inside the pt target folder. There's actually another folder named txlf inside that one. And there I found:


No TXML files! TXLF is a new instance of the rather ubiquitous XLIFF files one finds in the translation world, some of which have some rather bothersome "extensions" that may require special handling in the translation process. In the simple test I performed, none of that was apparent; an ordinary XLIFF filter seemed to work well. Future tests will show me if there are any quirks I hope, but so far, so good.

So one strategy, with pretty much any CAT tool, would be to unpack the GLP file, get at those TXLF files and then bring them into another working environment using an XLIFF filter. Maybe also use my approach with the source files too, which will ensure that you can deliver a good target file even if quirky tags in the XLIFF lead you to produce less than an optimal result there. 


The current version of memoQ (8.4) does not recognize the TXLF extension, so as in all such cases, the All files option must be used and the correct filter applied in a later dialog. Unlike with some other tools, memoQ cannot be "trained" by the user to recognize new extensions as far as I know.

But what about importing the GLP files directly to memoQ? Wouldn't that be nice? And I thought it might be possible using the ZIP file filter recently introduced (and the same All files trick to get the GLP file and apply the ZIP filter later). Well...


It looked promising.


So much so that I even optimistically named and saved a custom configuration for the ZIP filter. All I need to do now is cascade an XLIFF filter!


Ack. Sooooo close. I've been here before. There are more things in heaven and down-to-earth cascading formats, Kilgray, than are dreamt of in your philosophy! Please, please expand the list of possible cascaded formats sensibly to make better use of this lovely new ZIP filter!

So for now, that's a no-go, but soon? Who knows? If you bother support@kilgray.com and tell the memoQ team how helpful it would be, maybe this and similar problems can be solved with relative ease.

In any case, for now it seems that the unpack-and-do-the-XLIFF approach will work for most anyone with a modern CAT tool. And that's good news, because in today's fast-changing technology environment for translation, interoperability of CAT tools is increasingly important. It is a foolish waste of time to translate in a large number of CAT tools and probably a bad idea to do so in two or three according to my old research. I've usually found that such JOATs are, professionally, often stupid goats who lack the depth in a single major environment or two, which could allow them to get the most out of their tools and serve their clients in the best way with their linguistic skills and subject matter knowledge.

So is the latest Wordfast a tool worth checking out? I don't know yet. But it may be used by colleagues and clients with whom I like to work, and understanding how to share projects and project resources in painless ways will benefit all of us, no matter what our tool preferences may be. Wordfast seems to be developing very much in that spirit, so I will revisit it for more collaboration scenarios in the future.


Jun 3, 2018

Survey for Translation Transcription and Dictation

The website with the survey and short explainer video is http://www.sightcat.net
The idea is to build a human transcription service. We just need a few translators per language that want to work with a transcriptionist due to RSI, productivity etc. and we can use that data to build an ASR system for that language. There is also a good chance the ASR system will be accurate for domain-specific terminology and accents as it will be adaptive and use source language context. 
Take the Sight CAT survey - click here
Click on the graphic to go to the survey
memoQ Fest 2018 was, among other things, a good opportunity as always to spend time discussing things with some of the best and most interesting consultants, teachers, creative developers and brainstormers I know in the translation profession. One of these was my friend and colleague, John Moran, whose work on iOmegaT introduced me to the idea that properly designed, translator-controlled (voluntary) data logging could be a great boon to feature research and software development investment decisions. Sort of like SpyGate in translation, except that it isn't.

John and I have been talking, brainstorming and arguing about many aspects of translation technology for years now, dictation (voice recognition, ASR, whatever you want to call it) foremost among the topics. So I was very pleased to see him at the conference in Budapest last week, where he spoke about logging as a research tool in the program and a lot about speech recognition before and after in the breaks, bars, coffee houses and social event venues.

I think that one of the most memorable things about memoQ Fest 2018 was the introduction of the dictation tool currently called hey memoQ, which covers a lot of what John and I have discussed until the wee hours over the past four years or so and which also makes what I believe will be the first commercial use of source text guidance for target text dictation (not to mention switching to source text dictation when editing source texts!). John introduced that to me years ago based on some research that he follows. Fascinating stuff.

One of the things he has been interested in for a while for commercial, academic and ergonomic reasons is support for minor languages. Understandable for a guy who speaks Gaelic (I think) and has quite a lot of Gaelic resources which might contribute to a dictation solution some day. So while I'm excited about the coming memoQ release which will facilitate dictation in a CAT tool in 40 languages (more or less, probably a lot more in the future), John is thinking about smaller, underserved or unserved languages and those who rely on them in their working lives.

That's what his survey is about, and I hope you'll take the time to give him a piece of your mind... uh, share your thoughts I mean :-)

The Great Dictator in Translation.

I have no need for words. memoQ will have that covered in quite a few languages.


This is not your grandfather's memoQ!