Feb 20, 2017

Building a regex-savvy "termbase" in memoQ

For years I have been frustrated by and dissatisfied with how abbreviations are handled in the current memoQ termbase model. The crux of the problem is the handling of the periods in the expressions. This can be seen with termbase entries like the following, for example:

If the abbreviation "Art." appears in the source text, only the second source entry - the one without the period - will give a match result in memoQ. The first entry is simply ignored.

An additional problem which one would face, even if the terminal period character in the term did not pose a problem, is that authors are often notoriously variable in the way they write abbreviations. Take, for example, the abbreviation for the German expression "in Verbindung mit", usually written as "i.V.m."

In recent legal translation work, I have encountered this expression written as above, but also as "i. V. m." (with spaces), "iVm" (no spaces no periods) and sloppily typed variations like "iV.m" or "i. V.m." What's a poor wordworker to do?

The answer came to me while refining a set of auto-translation rules for bibliography formatting and legal references. These, too, can suffer from similar troubles: "page 7" might be abbreviated as "p. 7", but in the sloppy chaos of source texts poorly edited one might find "p.7", "p 7", "p7" or even variations with the letter capitalized, like "P.7". If you are translating nearly 1000 references in a bibliography, robust shortcuts are very helpful and save a lot of time, and if those shortcuts are based on memoQ auto-translation rules, they can also be used in a QA profile to ensure that every bit matches correctly.

As the screen capture from a memoQ Facebook group above suggests, the way to go about this is to identify which parts of the expression might vary with different deliberate and accidental typing. These are usually spaces and periods in the case of abbreviations; sometimes, particularly with German legal abbreviations, capitalization and dashes may play roles as well. (I tore my hair out not long ago trying to understand an Austrian legal text referring to two laws, which differed in their three-letter abbreviations only by a dash inserted after the first letter of one.)

In regular expressions, the question mark character means "zero or one" of whatever character precedes the question mark. So if I want a rule that acts in the case of one or no periods, I put a question mark after the period character. And because in the language of regular expressions, a period is shorthand for any character, if I want to talk about an actual period ("."), I have to precede that character by a backslash ("\."). In the technical jargon of Nerdworld that is known as "escaping the period" and there is no escaping such syntax if you want a regular expression rule about periods, period.

Spaces (normal or non-breaking ones) are represented by an escaped lowercase "s": "\s". So a matching rule for the English abbreviation "e.g" which catches a lot of typing variations might be


And in German, the target replacement rule might be


Of course, if a typist is sloppy, there might be more than one space, or a comma might be typed accidentally instead of a period (the keys are adjacent, and if your screen is as dirty as mine gets sometimes, your eyes might not notice); capitalization might also differ accidentally or based on context. The regular expressions for matching can be adapted to handle all these cases if need be.

Rules of this type are not particularly difficult to construct, but refining them to accommodate all the variations you are likely to encounter may require an expert hand. Thus, as I have suggested before,. the average user should focus on documenting all the possible source variations clearly in a table which includes the desired target equivalents, and this table should be given to an expert (Kilgray support, a qualified consultant like Marek Pawelec or a technical programmer familiar with regular expressions and their use in memoQ). Trust me, this will save a lot of frayed nerves and probably significant time and money as well.

So now I am building a few memoQ auto-translation rulesets which are essentially fault-tolerant abbreviation glossaries. These, together with the similar rulesets for formatting bibliographical references and references to sections, paragraphs, lines, margin notes, etc. in laws, have been very helpful in reducing the time spent translating messy legal source texts, and the accuracy of the work has been improved significantly. Give it a try for your translation challenges!

Jan 12, 2017

The ART of all-round translation....

There is a certain mythology that in Ye Goode Olde Days, life was simpler and more generalist and a whole lot easier. I suspect that is mostly bunk. The stresses and pressures were different, but probably no less when considered objectively. I remember trying to help my wife, a sometime English to German translator, find clients in the early 1990s, and back then if you weren't local, the clients mostly did not want to know. And don't get me started on the time and effort of terminology research for my own translations then and in the decades before.

But I think it is fair to say that today, even the specialist must be a JOAT of sorts, at least when it comes to the bag of technological and project management tricks to subdue the unruly projects that many of us often face. Colleagues Dorota Pawlak and Ellen Singer recognized the difficulties faced by many language specialists in acquiring some of the specialist and non-linguistic skills needed to cope with particular work challenges and designed a program of quarterly, half-day small workshops to provide just the environment needed to cultivate this new knowledge and establish bonds with others in the same endeavor.

Upcoming workshops I find particularly interesting include:

Transcreation with Alessandra Martelli on February 4, 2017 in Leiden and

no kidding, the regex workshop on April Fool's Day 2017 with my favorite tech guru, the brilliant but articulate Marek Pawelec, a first-rate teacher who can make even nasty stuff like regular expressions seem simple for the rest of us. And as I have pointed out in various articles, this knowledge can be extremely useful for those who work with tools like SDL Trados Studio, memoQ, Xbench and more.

I encourage you to have a look at the ART project site and see what else is on the menu; it seems to me that they have the right approach for those looking for a good start in interesting new areas.

And keep up to date with them on Twitter....

Jan 6, 2017

A matter of priority in memoQ

Every memoQ user knows the Translation Results pane.

It's that subwindow on the upper right part of the memoQ translation/editing environment which shows content matches from various sources, including translation memories, LiveDocs corpora, term bases, etc.

Most of us don't really do much with it. And why should we? Well..........

Sometimes there are an awful lot of "hits" displayed in that pane. Lots of matches from the TM, and if you're like me and record a lot of specialized terminology and company names not to be translated, sometimes the entry you need to see is not apparent at a glance; you must scroll down some way to find it.

This is a real problem when I am doing financial or legal translations using specialized autotranslatables, or when certain names and nontranslatable acronyms come up very often and cannot be seen conveniently in the visible part of the list in the results pane.

So what's a memoQ user to do? Change the order of data types displayed, for example.

Under Options > Appearance, you are able to change the relative display priority of hits from every kind of memoQ data shown (as well as change the color codes, though I think this is usually a bad idea). The example above has the autotranslatable matches (coded green) set to display at the top of the list. If I had a lot of proper names saved in nontranslatable lists, I would move that category toward the top as well to take advantage of improved visibility and better keyboard shortcuts.

Some jobs definitely benefit from a customized display order in the Translation Results pane. You can change the order in the Options each time to meet the needs of a particular job, or...

... more conveniently, you can have several different configuration files with particular settings for certain work. The relevant configuration is saved in the file Preferences-editor.xml, which is found at C:\Users|{username}\AppData\Roaming\MemoQ.

There are, of course, a lot of other files in that folder. I keep a shortcut on my Desktop now so I can get to the various configuration files quickly when I want to make changes.

The relevant changes to make in Preferences-editor.xml are found between the tag sets for <hitorderex> and <disabledhittypes>:
The first is the order in which the various types of translation hit results are to appear. The second lists those types in the sequence which are not to be displayed. Note that  also includes the types that will not be shown so that if their display is re-enabled, memoQ will know where they belong.

The correlation of the numeric codes used here to the hit types is as follows:
100 = Translation memory
200 = Term base
300 = Non-translatable
400 = Auto-translation
500 = Fragment assembly
600 = LSC
700 = Machine translation
So in the example above, the display of TM, fragment, LSC and machine translation results has been suppressed.

One convenient way to switch quickly between configuration "profiles" is to keep versions of the XML configuration files with descriptive suffixes in the filename and put an alias (shortcut) for that file somewhere convenient, like on your Desktop. Such a file where autotranslatables and nontranslatables are shown at the top might be Preferences-editor_Autotrans-Nontrans.xml

Before starting memoQ, I find the shortcut for configuration file I want loaded, open it by double-clicking and Save As... with the additions to the filename deleted. This will overwrite the preferences file that was used previously. To switch back, I quit memoQ, open the backup copy of the preferences file I usually use and save it under the name Preferences-editor.xml. Until Kilgray implements actual saveable/loadable user profiles, this is as easy as it will get. Of course this method can also encompass other aspects of configuration.