Jun 17, 2018

Ferramentas de Tradução - CAT Tools Day at Universidade Nova de Lisboa

The Faculty of Sciences and Humanities held its first "CAT Tools Day" on June 16, 2018 with a diverse program intended to provide a lusophone overview of current best practices in the technologies to support professional translation work. The event offered standard presentation and demonstrations in a university auditorium with parallel software introduction workshops for groups of up to 18 persons in an instructional computer lab in another building.

The day began with morning sessions covering SDL Trados Studio and various aspects of speech recognition.

Dr.  Helena Moniz explains aspects of speech analysis.

I found the presentation by Dr. Helena Moniz from the University of Lisbon faculty to be particularly interesting for its discussion of the many different voice models and how these are applied to speech recognition and text-to-speech synthesis. David Hardisty of FCSH at Universidade Nova also gave a good overview of the state of speech recognition for practical translation work, including his unobtrusive methods for utilizing machine pseudo-translation capabilities in dictated translations.

Parallel introductory workshops for software tools included memoQ 8, SDL Trados Studio 2017 and ABBYY FineReader - two sessions for each.

Attendees learned about ABBYY FineReader, SDL Trados Studio and memoQ in the translation computer lab

The ABBYY FineReader session I attended gave a good overview in Portuguese of basics and good practice, including a discussion of how to avoid common mistakes when converting scanned documents in a number of languages.

The afternoon featured several short, practical presentations by students, discussions by me regarding the upcoming integrated voice input solution for memoQ and the preparation of PDF files for reference, translation, print deadline emergencies and customer relations.

Rúben Mata discusses Discord

The final session of the day was a "tools clinic" - an open Q&A about any aspect of translation technology and workflow challenges. This was a good opportunity to reinforce and elaborate on the many useful concepts and practical approaches shown throughout the day and to share ideas on how to adapt and thrive as a professional in the language services sector today.

Hosts David Hardisty and Marco Neves of FCSH plan to make this an annual event to exchange knowledge on technology and best practices in translation and editing work in discussions between practicing professionals and academics in the lusophone community. So watch for announcements of the next event in 2019!

Some of the topics of this year's conference will be explored in greater depth in three 25-hour courses offered in Portuguese and English this summer at Universidade Nova in Lisbon. On July 9th there will be a thorough course on memoQ Basics and workflows, followed by a Best Practices course on July 19th, covering memoQ and many other aspects of professional work. On September 3rd the university will offer a course on project management skills for language services, including the memoQ Server, project management business tools, file preparation and more. It is apparently also possible to get inexpensive housing at the university to attend these courses, which is quite a good thing given the rapidly rising cost of accommodation in Lisbon. Details on the housing option will be posted on this blog when I can find them.

iPhone Google Maps in translation

When I first moved to Portugal I had a TomTom navigation system that I had used for a few years when I traveled. Upon crossing a border, I would usually change the language for audio cues, because listening to street names in one language pronounced badly in another was simply too confusing and possibly dangerous. Eventually, the navigation device died as crappy electronics inevitably do, and I changed over to smartphone navigation systems, first Apple Maps on my iPhone and, after I tired of getting sent down impossible goat trails in Minho, Google Maps, which generally did a better job of not getting me lost and into danger.
For the most part, the experience with Google Maps has been good. It's particularly nice for calling up restaurant information (hours, phone numbers, etc.) on the same display where I can initiate navigation to find the restaurant. The only problem was that using audio cues was painful, because the awful American woman's voice butchering Portuguese street names meant that my only hope of finding anything was to keep my eyes on the actual map and try to shut out (or simply turn off) the audio.

What I wanted was navigation instructions in Portuguese, at least while I am in Portugal; across the border in Spain it would be nice to have Spanish to avoid confusion. Not the spoken English voice of some clueless tourist from Oklahoma looking to find the nearest McDonald's and asking for prices in "real money". But although I found that I could at least dictate street names in a given language if I switched the input "keyboard" to that language, the app always spoke that awful, ignorant English.

And then it occurred to me: switch the entire interface language of the phone! Set your iPhone's language to German and Google Maps will pronounce German place names correctly. Same story for Portuguese, Spanish, etc. Presumably Hungarian too; I'll have to try that in Budapest next time. And that may have an additional benefit: fewer puzzled looks when someone asks where I'm staying and I can't even pronounce the street name.

It's a little disconcerting now to see all my notifications on the phone in Portuguese. But that's also useful, as the puzzle pieces of the language are mostly falling into place these days, and the only time I get completely confused now is if someone drops a Portuguese bomb into the middle of an English sentence when I'm not expecting it. Street names make sense now; I'm less distracted by the navigation voice when I drive.

And if some level of discomfort means that I use the damned smartphone less, that's a good thing too.

(Kevin Lossner)

Jun 15, 2018

Better WordPress translation with memoQ

Translating websites is mostly a royal pain in the tush. And I avoid it most of the time. Why? Several reasons.
  • Those who request website translations often have no idea what platform is used nor do they really know how much content is present.
  • They have very little understanding of the technical details or importance of translatable information hidden in tag attributes, selection lists, etc. and so there are often misunderstandings about the true volume to be translated.
  • There are a lot of sloppy cowboys slogging through the bog, glibly bidding low rates to translate sites they neither understand nor truly care about, and their victims... uh, prospects, customers, whatever... usually lack the expertise or the patience to understand the difference between a wild-ass lowball guess from someone lacking the skills and tools to do the job right and a carefully researched, reasonably accurate estimate of time and effort from a professional.
Shopping for "quotes" when neither you nor the one submitting a "bid" actually understand the technical basis of the project is a process with no guarantee of a satisfactory outcome. And too often this process turns out badly.

These days, many small companies use the popular content management system WordPress to manage their web sites. It may not be the best by some technical standards, but sometimes it is better to define "best" according to the likelihood of finding someone to provide services involving a platform and of there being such experts available not only now but for a reasonable amount of time in the future. I think it is fair to say that WordPress has met that standard for some time and will probably do so for some time more.

I have had a good number of requests for translating WordPress content in the past, but none of the estimates given were accepted, because typically the content to be translated was an order of magnitude greater than the client realized or nobody could commit to a clear decision on what parts were to be translated and what parts were unimportant. And then we have the problem that many sites use themes which are poorly designed as multilingual structures.

The WordPress Multilingual Plug-in (WPML) makes sensible, professional management of websites with content in more than one language much easier. When I learned about this technology more than a year ago, I suggested its use to the person who requested a quotation for translation services, but that suggestion is probably still echoing somewhere out there in the Void.

At memoQ Fest 2018 this year in Budapest, I had the pleasure to attend a superb presentation by Stefan Weimar on how to cope with the translation of Wordpress sites and some of what you need to know to use the WPML technology right. I was inspired and hoped to have an opportunity to look at things more closely some day.

That day turned out to be a week later. Funny how that goes.

Three or four years ago I translated a small web site for a friend's company. At the time, the site used the Typo3 content management system, which proved to be troublesome. Not so much because of the technology, but because of the service provider using it, who rejected any suggestions for providing the content to be translated in a form that would not require his manual intervention at the text level. He copied, pasted and improved (German: verschlimmbesserte) my translation as only a German with full confidence in his grade school English skills could. It was... not what anyone had hoped for, and I never  found the heart to mention all the mistakes in the final result.

So now, when someone asked me to have a look at their new site, I felt a bit queasy. Nunca mais, I thought. No way, José. Or Wolfgang as it were. But in the meantime, unbeknownst to me, he had switched service providers and CMS platforms, and the new provider managing his web content is a professional with a professional understanding of sites for international clients in many languages. And he uses WPML. The right way!

So now it was up to me to figure out what's what in memoQ. So first I used the memoQ XLIFF filter on all the little XLIFFs supported by the plug-in. I quickly saw that a few other things were needed, like a cascaded HTML filter...

Somewhat messy, but doable once the HTML tags get properly protected by a chained filter.

Then I tried again, this time applying memoQ's WordPress (WPML) filter. And this was the result:

That was easy. Hmm. I think I know which method I prefer.

So for translations of WordPress websites properly configured to use WPML technology, the new memoQ filter looks like a winner!

Jun 14, 2018

Translating Wordfast GLP packages... elsewhere.

One reason to keep  translation environment tool licenses up to date is that new formats continue to appear. New formats for translatable files as well as new file formats for the tools that help to process files for translation. Very often I have heard some "professional" say "I'm a translator, not a [fill in the blank]. If the client wants this translated, I'll have to get it in a Microsoft Word file." Or something like that.

Let's get real for a moment.

  • That attitude is simply lazy and disrespectful toward translation consumers who would like to make use of one's services and
  • a lot of money is being left on the table here in many cases. I built a huge clientele at the start of the last decade, because my use of translation environment tools like Trados, Déja Vu, STAR Transit and Wordfast enabled me as an individual to tackle translation challenges that many agencies at the time had no concept of how to cope with.
As translation agencies have acquired more technical tools, most of them still remain unfortunately unaware of how to use them properly or plan more than the simplest workflows well, but that's a subject for another day. Also...
  • ... by using tools and techniques that are compatible with what your clients require for a final format, you can save your client a lot of time and money for further layout work - and probably avoid the introduction of errors in your translation work in its final format as well.
  • And in my experience, showing technical and process competence to benefit clients usually leads to greater trust and better work together.
So what has all this got to do with Wordfast?

Well... I didn't like the Wordfast brand for a very long time. Its various incarnations were perhaps the weakest of the popular tools in a technical sense, and inevitably when agency friends called me, desperate to fix some massive translator screw-up (usually by somebody in France), Wordfast "Pro" was often involved in the disaster.

I looked at the "newer" Wordfast versions a number of times over the years, and honestly they always seemed like lobotomized wannabe tools. This was about the time that many other toolmakers were trying to decide if they should support XLIFF.

Well, a lot has changed since then. I became aware of the changes the other day when somebody posted a question in a social media forum for memoQ asking how to handle Wordfast Pro 5 GLP packages. I had never heard of these, so of course I was curious and decided to take a look. This finally led me to download a 30-day trial of the latest Wordfast Pro software to evaluate its potential for interoperable work with other translation environments. I see a lot of changes since my last look, and so far I think they are all positive, and along the way I had good cause to look at Wordfast Anywhere, the free web-based CAT tool that I talked some university colleagues into not wasting their time with a while ago. Well, my recommendation in that regard might change, but that and commentary on the latest incarnation of WF Pro will have to wait for another day.

About those GLP packages....

Yes, those. This was the question:

Someone pointed out that GLP files - like every other translation "package" one finds from all the tool providers - are merely ZIP files with particular structure inside and the extension re-named. 

Gotta love Facebook. You'll always get an answer in some group, usually a wrong one. That's why I keep a blog. Good information gets buried in social media noise too often, and good luck finding it in any kind of search. In this case... we don' have no steenkeen TXML files as I learned... that's the old Wordfast Pro....

A colleague in Germany kindly provided me with a little GLP package to examine, which I promptly unzipped. I noticed that at least one tool (7-Zip) sees through the renamed extension nonsense and saved me the usual trouble of renaming it before unpacking.

So far, so good... inside the folder for the unpacked GLP file I found the following:

The test package was an English to Portuguese project. But source? Hello? Let's have a look there!

Very interesting. The original source files (English) came along for the ride. This is good, because I often like to translate source files in memoQ - taking advantage of the preview there for many file types - and then use the translation memory to translate the file that is created by other other tool (usually SDL Trados SDLXLIFF files in my work). Now let's have a look inside the pt target folder. There's actually another folder named txlf inside that one. And there I found:

No TXML files! TXLF is a new instance of the rather ubiquitous XLIFF files one finds in the translation world, some of which have some rather bothersome "extensions" that may require special handling in the translation process. In the simple test I performed, none of that was apparent; an ordinary XLIFF filter seemed to work well. Future tests will show me if there are any quirks I hope, but so far, so good.

So one strategy, with pretty much any CAT tool, would be to unpack the GLP file, get at those TXLF files and then bring them into another working environment using an XLIFF filter. Maybe also use my approach with the source files too, which will ensure that you can deliver a good target file even if quirky tags in the XLIFF lead you to produce less than an optimal result there. 

The current version of memoQ (8.4) does not recognize the TXLF extension, so as in all such cases, the All files option must be used and the correct filter applied in a later dialog. Unlike with some other tools, memoQ cannot be "trained" by the user to recognize new extensions as far as I know.

But what about importing the GLP files directly to memoQ? Wouldn't that be nice? And I thought it might be possible using the ZIP file filter recently introduced (and the same All files trick to get the GLP file and apply the ZIP filter later). Well...

It looked promising.

So much so that I even optimistically named and saved a custom configuration for the ZIP filter. All I need to do now is cascade an XLIFF filter!

Ack. Sooooo close. I've been here before. There are more things in heaven and down-to-earth cascading formats, Kilgray, than are dreamt of in your philosophy! Please, please expand the list of possible cascaded formats sensibly to make better use of this lovely new ZIP filter!

So for now, that's a no-go, but soon? Who knows? If you bother and tell the memoQ team how helpful it would be, maybe this and similar problems can be solved with relative ease.

In any case, for now it seems that the unpack-and-do-the-XLIFF approach will work for most anyone with a modern CAT tool. And that's good news, because in today's fast-changing technology environment for translation, interoperability of CAT tools is increasingly important. It is a foolish waste of time to translate in a large number of CAT tools and probably a bad idea to do so in two or three according to my old research. I've usually found that such JOATs are, professionally, often stupid goats who lack the depth in a single major environment or two, which could allow them to get the most out of their tools and serve their clients in the best way with their linguistic skills and subject matter knowledge.

So is the latest Wordfast a tool worth checking out? I don't know yet. But it may be used by colleagues and clients with whom I like to work, and understanding how to share projects and project resources in painless ways will benefit all of us, no matter what our tool preferences may be. Wordfast seems to be developing very much in that spirit, so I will revisit it for more collaboration scenarios in the future.

Jun 3, 2018

Survey for Translation Transcription and Dictation

The website with the survey and short explainer video is
The idea is to build a human transcription service. We just need a few translators per language that want to work with a transcriptionist due to RSI, productivity etc. and we can use that data to build an ASR system for that language. There is also a good chance the ASR system will be accurate for domain-specific terminology and accents as it will be adaptive and use source language context. 
Take the Sight CAT survey - click here
Click on the graphic to go to the survey
memoQ Fest 2018 was, among other things, a good opportunity as always to spend time discussing things with some of the best and most interesting consultants, teachers, creative developers and brainstormers I know in the translation profession. One of these was my friend and colleague, John Moran, whose work on iOmegaT introduced me to the idea that properly designed, translator-controlled (voluntary) data logging could be a great boon to feature research and software development investment decisions. Sort of like SpyGate in translation, except that it isn't.

John and I have been talking, brainstorming and arguing about many aspects of translation technology for years now, dictation (voice recognition, ASR, whatever you want to call it) foremost among the topics. So I was very pleased to see him at the conference in Budapest last week, where he spoke about logging as a research tool in the program and a lot about speech recognition before and after in the breaks, bars, coffee houses and social event venues.

I think that one of the most memorable things about memoQ Fest 2018 was the introduction of the dictation tool currently called hey memoQ, which covers a lot of what John and I have discussed until the wee hours over the past four years or so and which also makes what I believe will be the first commercial use of source text guidance for target text dictation (not to mention switching to source text dictation when editing source texts!). John introduced that to me years ago based on some research that he follows. Fascinating stuff.

One of the things he has been interested in for a while for commercial, academic and ergonomic reasons is support for minor languages. Understandable for a guy who speaks Gaelic (I think) and has quite a lot of Gaelic resources which might contribute to a dictation solution some day. So while I'm excited about the coming memoQ release which will facilitate dictation in a CAT tool in 40 languages (more or less, probably a lot more in the future), John is thinking about smaller, underserved or unserved languages and those who rely on them in their working lives.

That's what his survey is about, and I hope you'll take the time to give him a piece of your mind... uh, share your thoughts I mean :-)

The Great Dictator in Translation.

I have no need for words. memoQ will have that covered in quite a few languages.

This is not your grandfather's memoQ!