Apr 14, 2018

memoQ filter for MS Outlook e-mail

A few days ago I was preparing screenshots in memoQ for lecture slides. As I tried to select a PDF file to import, the defective trackpad on my laptop caused a file farther down in the list to be selected, and I got a surprise. Not believing my eyes, I tried again and saw that, yes, what I saw was indeed possible...

... saved Microsoft Outlook MSG files (e-mail) are imported to memoQ with all their graphics and attachments! Kilgray created a filter some time ago and simply forgot to document its existence publicly. As of the current versions of memoQ you won't see this in the documentation or the filter lists of the interface, but memoQ can "see" MSG files, and if they are selected, this hidden filter will appear in the import dialog.

And this also works for LiveDocs.

At the time of this discovery, I was working on a little job for a friend's agency, and her project manager had sent me a list of abbreviations in an e-mail. I was too lazy to make the entries in my termbase, so I simply imported the mail to the LiveDocs corpus I maintain for her shop so that it would show up in concordance searches:

So when people tell you memoQ is good, don't believe them. It's actually better than that, but the truth is a well-kept secret :-)

Apr 4, 2018

New in memoQ 8.4: easy stopword list creation!

This wasn't really on Kilgray's plan, but hey - it's now possible, and that makes my life easier. An accidental "feature".

Four years ago, frustrated by the inability of memoQ to import stopword lists obtained from other sources to memoQ, I published a somewhat complex workaround, which I have used in workshops and classes when I teach terminology mining techniques. For years I had suggested that adding and merging such lists be facilitated in some way, because the memoQ stopword list editor really sucks (and still does). Alas, the suggestion was not taken up, so translators of most source languages were left high and dry if they wanted to do term extraction in memoQ and avoid the noise of common, uninteresting words.

Enter memoQ version 8.4... with a lot of very nice improvements in terminology management features, which will be the subject of other posts in the future. I've had a lot of very interesting discussions with the Kilgray team since last autumn, and the directions they've indicated for terminology in memoQ have been very encouraging. The most recent versions (8.3 and 8.4) have delivered on quite a number of those promises.

I have used memoQ's term extraction module since it was first introduced in version 5, but it was really a prototype, not a properly finished tool despite its superiority over many others in a lot of ways. One of its biggest weaknesses was the handling of stopwords (used to filter out unwanted "word noise". It was difficult to build lists that did not already exist, and it was also difficult to add words to the list, because both the editor and the term extraction module allowed only one word to be added at a time. Quite a nuisance.

In memoQ 8.4, however, we can now add any number of selected words in an extraction session to the stopword list. This eliminates my main gripe with the term extraction module. And this afternoon, while I was chatting with Kilgray's Peter Reynolds about what I like about terminology in memoQ 8.4, a remark from him inspired the realization that it is now very easy to create a memoQ stopword list from any old stopword lists for any language.

How? Let me show you with a couple of Dutch stopword lists I pulled off the Internet :-)

I've been collecting stopword lists for friends and colleagues for years; I probably have 40 or 50 languages covered by now. I use these when I teach about AntConc for term extraction, but the manual process of converting these to use in memoQ has simply been too intimidating for most people.

But now we can import and combine these lists easily with a bogus term extraction session!

First I create a project in memoQ, setting the source language to the one for which I want to build or expand a stopword list. The target language does not matter. Then I import the stopword lists into that project as "translation documents".

On the Preparation ribbon in the open project, I then choose Extract Terms and tell the program to use the stopword lists I imported as "translation documents". Some special settings are required for this extraction:

The two areas marked with red boxes are critical. Change all the values there to "1" to ensure that every word is included. Ordinarily, these values are higher, because the term extraction module in memoQ is designed to pick words based on their frequencies, and a typical minimum frequency used is 3 or 4 occurrences. Some stopword lists I have seen include multiple word expressions, but memoQ stopword lists work with single words, so the maximum length in words needs to be one.

Select all the words in the list (by selecting the first entry, scrolling to the bottom and then clicking on the last entry while holding down the Shift key to get everything), and then select the command from the ribbon to add the selected candidates to the stopword list.

But we don't have a Dutch stopword list! No matter:

Just create a new one when the dialog appears!

After the OK button is clicked to create the list, the new list appears with all the selected candidates included. When you close that dialog, be sure to click Yes to save the changes or the words will not be added!

Now my Dutch stopword list is available for term extraction in Dutch documents in the future and will appear in the dropdown menu of the term extraction session's settings dialog when a session is created or restarted. And with the new features in memoQ 8.4, it's a very simple matter to select and add more words to the list in the future, including all "dropped" terms if you want to do that.

More sophisticated use of your new list would include changing the 3-digit codes which are used with stopwords in memoQ to allow certain words to appear at the beginning, in the middle, or at the end of phrases. If anyone is interested in that, they can read about it in my blog post from six years ago. But even without all that, the new stopword lists should be a great help for more efficient term extractions for your source languages in the future.

And, of course, like all memoQ light resources, these lists can be exported and shared with other memoQ users who work with the same source language.

Complicated XML in memoQ: a filtering case example

Most of the time when I deal with XML files in memoQ things are rather simple. Most of the time, in fact, I can use the default settings of the standard XML import filter, and everything works fine. (Maybe that's because a lot of my XML imports are extracted from PDF files using iceni InFix, which is the alternative to the TransPDF XLIFF exports using iceni's online service; this overcomes any confidentiality issues by keeping everything local.)

Sometimes, however, things are not so simple. Like with this XML file a client sent recently:

Now if you look at the file, you might think the XLIFF filter should be used. But if you do that, the following error message would result in memoQ:

That is because the monkey who programmed the "XLIFF" export from the CMS system where the text resides was one of those fools who don't concern themselves with actual file format specifications. A number of the tags and attributes in the file simply do not conform to the XLIFF standards. There is a lot of that kind of stupidity to be found.

Fear not, however: one can work with this file using a modified XML filter in memoQ. But which one?

At first I thought to use the "Multilingual XML" filter that I have heard about and never used, but this turned out to be a dead end. It is language-pair specific, and really not the best option in this case. I was concerned that there might be more files like this in the future involving other language pairs, and I did not want to be bothered with customizing for each possible case.

So I looked a little closer... and noted that this export has the source text copied exactly to the "target". So I concentrated on building a customized XML filter configuration that would just pull the text to translate from between the target tags. A custom configuration of the XML filter was created after populating the tags by excluding the "source" tag content:

That worked, but not well enough. In the screenshot below, the excluded source content is shown with a gray background, but the imported content has a lot of HTML, for which the tags must be protected:

The next step is to do the import again, but this time including an HTML filter after the customized XML filter. In memoQ jargon, this sort of configuration is known as a "cascading filter" - where various filters are sequenced to handle compounded formats. Make sure, however, that the customized XML filter configuration has been saved first:

Then choose that custom configuration when you import the file using Import with Options:

This cascaded configuration can also be saved using the corresponding icon button.

This saved custom cascading filter configuration is available for later use, and like any memoQ "!light resource", it can be exported to other memoQ installations.

The final import looks much better, and the segmentation is also correct now that the HTML tags have been properly filtered:

If you encounter a "special" XML case to translate, the actual format will surely be different, and the specific steps needed may differ somewhat as well. But by breaking the problem down in stages and considering what more might need to be done at each stage to get a workable result with all the non-translatable content protected, you or your technical support associates can almost always build a customized, re-usable import filter in reasonable time, giving you an advantage over those who lack the proper tools and knowledge and ensuring that your client's content can be translated without undue technical risks.

Apr 3, 2018

Dealing with tagged translatable text in memoQ

Lately I've been doing a bit of custom filter development for some translation agency clients. Most of it has been relatively simple stuff, like chaining an HTML filter after an Excel filter to protect HTML tags around the text in the Excel cells, but some of it is more involved; in a few cases, three levels of filters had to be combined using memoQ's cascading filter feature.

And sometimes things go too far....

A client had quite a number of JSON files, which were the basis for some online programming tutorials. There was quite a lot of non-translatable content that made it past memoQ's default JSON filter, much of which - if modified in any way - would mess up the functionality of the translated content and require a lot of troublesome post-editing and correction. In the example above, Seconds in a day: is clearly translatable text, but the special rules used with the Regex Tagger turned that text (and others) into protected tags. And unfortunately the rules could not be edited efficiently to avoid this without leaving a lot of untranslatable content unprotected and driving up the cost (due to increased word count) for the client.

In situations like this, there is only one proper thing to do in memoQ: edit the tags!

There are two ways to do this:

  • use the inline tag editing features of memoQ or
  • edit the tag on the target side of a memoQ RTF bilingual review file.
The second approach can be carried out by someone (like the client) in any reasonable text editor; tags in an RTF bilingual are represented as red text:

If, however, you go the RTF bilingual route, it's important to specify that the full text of the tags is to be exported, or all you'll get are numbers in brackets as placeholders:

Editing tags in the memoQ working environment is also straightforward:

On the Edit ribbon, select Tag Commands and chose the option Edit Inline Tag

When you change the tag content as required, remember to click the Save button in the editing dialog each time, or your changes will be lost.

These methods can be applied to cases such as HTML or XML attribute text which needs to be translated but which instead has been embedded in a tag due to an incorrectly configured filter. I've seen that rather often unfortunately.

The effort involved here is greater than the typical word- or character-based compensation schemes can justly compensate and should be charged at a decent hourly rate or be included in project management fees. 

A lot of translators are rather "tag-phobic", but the reality of translation today is that tags are an essential part of the translatable content, serving to format translatable content in some cases and containing (unfortunately) embedded text which needs to be translated in other (fortunately less common) cases. Correct handling of tags by translation service providers delivers considerable value to end clients by enabling translations to be produced directly in the file formats needed, saving a great deal of time and money for the client in many cases.

One reasonable objection that many translators have is that the flawed compensation models typically used in the bulk market bog do not fairly include the extra effort of working with tags. In simple cases where the tags are simply part of the format (or are residual garbage from a poorly prepared OCR file, for example), a fair way of dealing with this is to count the tags as words or as an average character equivalent. This is what I usually do, but in the case of tags which need editing, this is not enough, and an hourly charge would apply.

In the filter development project for the JSON files received by my agency client, the text used was initially analyzed at
14,985 words; 111,085 characters; 65 tags
and after proper tagging of the coded content to be protected it was
8766 words; 46,949 characters; 2718 tags.
The reduction in text count more than covered the cost of the few hours needed to produce the cascading filter needed for this client's case and largely ensured that the translator could not alter text which would impair the function of the product.

Mar 14, 2018

Come to Terms in Amsterdam, June 30th

At end of June this year I'll be doing an expanded, in-person reboot of my occasional terminology workshop with new material and workflows for those who want to do more to control quality and improve communicative vocabularies in interpreting, translation and review projects.

Space is limited at the All-Round Translator event, but I hope you can join us to learn about
  • Better teamwork through timely terminology sharing
  • Faster, more effective discovery of frequently occurring specialist terminology
  • Better access to critical terminology in many environments
  • More efficient and accurate QA for terminology
  • More accurate, efficient and fault-tolerant term use when translating with memoQ
  • Greater flexibility to meet client terminology needs
The Early Bird rate for the workshop is €99 + VAT until the end of April, €120 + VAT  thereafter.

The content is applicable to work with many translation environments, but some segments will share particular tips for maximum productivity using the unbeatably practical memoQ environments.

Farewell, Mr. Hawking.

It's often quite interesting to see what Google offers to auto-complete what one types. Today, on the birthday of Albert Einstein, another great physicist, Steven Hawking is in the news again, the man born on the day Galileo died having taken this day to shed his mortal form.

Professor Hawking has been on my mind quite a bit in recent years, though his disciplines are quite different from any I have practiced. But as life has happened to the son of a colleague, and the debate on quality of life and euthanasia has evolved, my thoughts have returned time and again to the man who was supposed to die more than half a century ago but who instead lived a full life and made enormous contributions to science and human understanding of the Universe. A man who, by some modern practice, would have been aborted in the womb after DNA testing or who might have been put down like a stray dog if some Dutch or Belgian doctor decided his life was an undue cost burden on the State.

Arrogant fools make such a calculation, and their House of Evil too often has its foundation on such "good" intentions. And humanity is poorer for that, immeasurably so.

Education, vocational training and professional practice without well-considered ethical foundations, subject to frequent, careful integrity checks rest on uncertain ground that can too easily turn to deep mud or quicksand and swallow all that keeps us apart from the most degraded of beasts. If you want examples of the grand architecture of the human spirit, look not to the Coliseum, the great aqueducts or the pyramids. Look to the tubercular Keats. To Helen Keller. To Freddie Mercury. To Mr. Hawking. Time held them green and dying, but their songs carry on as the sound of the sea.

Mar 10, 2018

Virtual symposium on AI, MpT and language processing March 26-29, 2018

The worlds of artificial intelligence and machine pseudo-translation are largely ones of delusion, wishful thinking, deceit and professional manipulation, but once in a while one encounters a few people in these fields who are worth the time to listen and discuss. Dion "Donny the Wig" Wiggins of Omniscien, formerly Asia Online, is one of these: a researcher at heart, it seems to me, and someone with a good appreciation of processes, even those having little to do with the technology he represents in his day job. Although an established godfather of the MT Mafia, his approaches to application have a large dose of common sense largely absent from the ignorant masses who place their faith in technologies they do not actually understand in detail. More than once he has shared workflow "revelations" that back up old research and testing of mine, but with more and better data to show how great productivity gains can be achieved by simple reorganization of common tasks. So when he told me about his company's upcoming symposium, I knew that it probably wouldn't be the usual bullshit-tinted fluff drifting through the professional atmosphere of translation these days.

Click the graphic above to see the symposium program - attendance is free. You'll see some familiar names and perhaps conclude that there might in fact be a bit of BS in the air, but there is likely to be a good bit of substance to consider and to apply even in areas not covered by the program.

One of the biggest problems I have with machine pseudo-translation technologies is the utter ignorance and dishonesty of many of its promoters and the massive social engineering which takes place to persuade and intimidate people to become its willing victims in areas where it offers little or no real value. The continued disregard for documented occupational health issues and language skills distortion in post-editing processes, and the vile corruption of academic programs to produce a new generation of linguistic dullards who cannot distinguish algorithmic spew from real human language are all matters of significant concern. But if we are to engage the Forces of Evil and know our Enemies and keep them within their wholly legitimate domain, this event might be a place to start :-) See you there.

Feb 21, 2018

Double Vision with MS PowerPoint to Check Translations

I learned an interesting thing tonight thanks to David Hardisty, who teaches translation technical skills at Universidade Nova in Lisbon and translates quite a bit from European Portuguese to English. For all the years - decades? - that I have used Microsoft PowerPoint, I was never aware that it is possible to run two different presentations - as presentations - at the same time.

Why is this useful? If you want to do a detailed, screen-by-screen comparison of two different presentations - say a German original and its Arabic translation - and ensure that everything looks OK with the with animations, effects, text sizing in fields, etc. this can enable more focused, accurate comparisons. Just clicking through the slides in two windows in edit mode might miss something that depends on dynamic elements in the application.

Here's how you do this:

1. Open the first PowerPoint file.

2. Go to Slide Show... Set Up Slide Show... choose Browsed by an individual (window)

3. Click OK. Then press F5. Resize the presentation window as you like and put it wherever you want on one of your screens.
Then repeat Steps 1 through 3 for the second presentation. And the third if you like. And....

Jan 30, 2018

Doing memSource better in memoQ with @wasaty!

This post has been updated. The good two-template solution has been improved to make a one-template solution. This is user engagement at its best in the world of memoQ.

Marek Pawelec (aka @wasaty), one of my favorite technical solution finders in translation, has published an effective improvement for those who prefer to do memSource projects in memoQ. I have done a good bit of this in the past, as I greatly dislike the limitations of the memSource local editor and dislike browser environments (from any firm) even more for translation, but the funky interpretation of XLIFF used by that tool requires some custom filter configuration to enable work to proceed without the risk to unrecognized tags. Even so, the inability to transfer match percentage information and locked status for segments gave me more than a few headaches with these projects.

Someone at Kilgray mentioned a while ago that a proper memSource filter had been considered, but that resources were, alas, focused on other priorities, like 8.x "fixes" to features that weren't broken so that life would become more interesting for legal and financial translators whose work was becoming too easy with memoQ 7.8. No matter: once again, Marek has come through with an excellent professional solution for doing memSource better in memoQ.

Some highlights of the template provided:
  • memSource match rates are visible in memoQ
  • locked segments stay locked!
  • "translated" status will be kept
  • machine pseudo-translated garbage is marked with "MT" status in memoQ
  • memSource tags can be converted to memoQ tags
  • populated segments can be given "edited" status
Currently, this template is the best technical solution for working more efficiently and accurately with memSource MXLIFF files in memoQ and will probably remain so until Kilgray does get around to creating a properly integrated filter with configurable options. So if you have valued customers who use memSource but you want to leverage all your memoQ resources to do the work better, Marek's template is for you. Check out the detailed description and instructions on his blog!

Jan 29, 2018

Contract Language Explained for Translation - with Paula Arturo

On February 5th at 5:00 pm GMT (noon EST, 9 am PST, 6 pm CET), translating attorney Paula Arturo will be presenting a webinar for the American Translators Association on the application of language categories in contract translation. This should be an interesting and useful session for persons working into or out of English.

For more information, have a look at the presenter's announcement - and check out the rest of her interesting legal translation blog, Language with a Pinch of Law.

Jan 19, 2018

Call for proposals: 2018 Mediterranean Editors & Translators Meeting in Girona, Spain

The submission deadline for presentation abstracts is February 28, 2018.

The Mediterranean Editors and Translators annual meeting is an opportunity for professional education and exchange which has been on my radar for quite a few years. It was a publication by a few of its members which got me started more than a decade ago with corpus linguistics and better approaches to terminology identification and management, and the group's workshops are among the best value-for-money CPD programs I've seen. When I attended the meeting near Madrid a few years ago, I was deeply impressed by the way in which highly experienced, top-notch colleagues mixed well with rank beginners. This year, I'll be extending my time in Spain after the IAPTI conference in Valencia and go a bit farther up the coast to learn from and share ideas with the excellent professional peers there. Why don't you join me?

The 2018 METM in Girona, Spain offers a day and a half of presentations and keynotes, two half-days of pre-conference workshops, and a program of additional events. The city is located about 440 km from Valencia and 100 km from Barcelona and has good local air and train connections. The venue, Centre Cultural La Mercè, is in the heart of the old town, on the site of a 14th-century convent, which is now a municipal cultural center.

Come to METM 2018 for the professional atmosphere and enrichment, stay to enjoy the beautiful Spanish culture and cuisine.

Jan 18, 2018

memoQfest 2018 call for papers and the memoQ Trend Report

The tenth annual conference for memoQ technology will be held by Kilgray in Budapest on May 30 to June 1, 2018. Even though the programs for these events sometimes seems overly skewed toward the bulk market, there is no better opportunity for anyone interested in the productive use of translation technology to meet and consult with experts on how to get the most out of memoQ as an individual translator, a language services broker, a corporate translation manager or someone in another technical or managerial role related to translation.

Presentation proposals for memoQfest 2018 are now being accepted; the final deadline for submissions is February 5, 2018. Why not share your expertise and help move the discussions for product development and use in a direction you feel they should go?

Today, Kilgray also published a new site discussing "trends" in translation technology and the thoughts and opinions of key personnel and memoQ users in that regard. The memoQ Trend Report isn't really a report; I'm not sure what to call it or what its purpose is, but many of the points discussed are interesting and worth thinking about. More than the content, I particularly liked the technical implementation of the new site, particularly how it works on a smartphone. The success of adaptive design here gives me something to aspire to when I get around to remaking my personal business web site one of these days. Have a look on both mobile devices and large screens, and add your thoughts to the discussions!