Pages

Feb 23, 2017

memoQuickie: version 8.0 begins public "beta" testing

At breakfast in the Social Media Cafe this morning:


You may have seen the hype behind the "memoQ Adriatic" rollout yesterday. AFAIK this is the first version of the software released without beta-testing, so the release is essentially a beta test. Beware.

The early reaction of one LSP project manager on the memoQ Facebook group makes many of the relevant points. The "new" features are mostly quite beside the point for most of us and are dealt with better elsewhere.

The choice of version "name" also strikes many as bizarre and out of touch. When Kilgray began to ape Microsoft and SDL by including years in the release designation, I said it was a bad idea. This apparent attempt to take cues from Apple's marketing is even worse.

I think this version can be ignored for the most part. Certainly for now in this dangerous beta (or perhaps alpha?) phase. Style is all very pretty, folks, but we need some real substance to address the challenges of translation technology today. Really.

For a "management summary" of new features it seems that the online Help file is your best bet.



Feb 20, 2017

Building a regex-savvy "termbase" in memoQ


For years I have been frustrated by and dissatisfied with how abbreviations are handled in the current memoQ termbase model. The crux of the problem is the handling of the periods in the expressions. This can be seen with termbase entries like the following, for example:


If the abbreviation "Art." appears in the source text, only the second source entry - the one without the period - will give a match result in memoQ. The first entry is simply ignored.

An additional problem which one would face, even if the terminal period character in the term did not pose a problem, is that authors are often notoriously variable in the way they write abbreviations. Take, for example, the abbreviation for the German expression "in Verbindung mit", usually written as "i.V.m."

In recent legal translation work, I have encountered this expression written as above, but also as "i. V. m." (with spaces), "iVm" (no spaces no periods) and sloppily typed variations like "iV.m" or "i. V.m." What's a poor wordworker to do?

The answer came to me while refining a set of auto-translation rules for bibliography formatting and legal references. These, too, can suffer from similar troubles: "page 7" might be abbreviated as "p. 7", but in the sloppy chaos of source texts poorly edited one might find "p.7", "p 7", "p7" or even variations with the letter capitalized, like "P.7". If you are translating nearly 1000 references in a bibliography, robust shortcuts are very helpful and save a lot of time, and if those shortcuts are based on memoQ auto-translation rules, they can also be used in a QA profile to ensure that every bit matches correctly.

As the screen capture from a memoQ Facebook group above suggests, the way to go about this is to identify which parts of the expression might vary with different deliberate and accidental typing. These are usually spaces and periods in the case of abbreviations; sometimes, particularly with German legal abbreviations, capitalization and dashes may play roles as well. (I tore my hair out not long ago trying to understand an Austrian legal text referring to two laws, which differed in their three-letter abbreviations only by a dash inserted after the first letter of one.)

In regular expressions, the question mark character means "zero or one" of whatever character precedes the question mark. So if I want a rule that acts in the case of one or no periods, I put a question mark after the period character. And because in the language of regular expressions, a period is shorthand for any character, if I want to talk about an actual period ("."), I have to precede that character by a backslash ("\."). In the technical jargon of Nerdworld that is known as "escaping the period" and there is no escaping such syntax if you want a regular expression rule about periods, period.

Spaces (normal or non-breaking ones) are represented by an escaped lowercase "s": "\s". So a matching rule for the English abbreviation "e.g" which catches a lot of typing variations might be

e\.?\s?g\.?

And in German, the target replacement rule might be

d.h.

Of course, if a typist is sloppy, there might be more than one space, or a comma might be typed accidentally instead of a period (the keys are adjacent, and if your screen is as dirty as mine gets sometimes, your eyes might not notice); capitalization might also differ accidentally or based on context. The regular expressions for matching can be adapted to handle all these cases if need be.

Rules of this type are not particularly difficult to construct, but refining them to accommodate all the variations you are likely to encounter may require an expert hand. Thus, as I have suggested before,. the average user should focus on documenting all the possible source variations clearly in a table which includes the desired target equivalents, and this table should be given to an expert (Kilgray support, a qualified consultant like Marek Pawelec or a technical programmer familiar with regular expressions and their use in memoQ). Trust me, this will save a lot of frayed nerves and probably significant time and money as well.

So now I am building a few memoQ auto-translation rulesets which are essentially fault-tolerant abbreviation glossaries. These, together with the similar rulesets for formatting bibliographical references and references to sections, paragraphs, lines, margin notes, etc. in laws, have been very helpful in reducing the time spent translating messy legal source texts, and the accuracy of the work has been improved significantly. Give it a try for your translation challenges!

Jan 12, 2017

The ART of all-round translation....


There is a certain mythology that in Ye Goode Olde Days, life was simpler and more generalist and a whole lot easier. I suspect that is mostly bunk. The stresses and pressures were different, but probably no less when considered objectively. I remember trying to help my wife, a sometime English to German translator, find clients in the early 1990s, and back then if you weren't local, the clients mostly did not want to know. And don't get me started on the time and effort of terminology research for my own translations then and in the decades before.

But I think it is fair to say that today, even the specialist must be a JOAT of sorts, at least when it comes to the bag of technological and project management tricks to subdue the unruly projects that many of us often face. Colleagues Dorota Pawlak and Ellen Singer recognized the difficulties faced by many language specialists in acquiring some of the specialist and non-linguistic skills needed to cope with particular work challenges and designed a program of quarterly, half-day small workshops to provide just the environment needed to cultivate this new knowledge and establish bonds with others in the same endeavor.

Upcoming workshops I find particularly interesting include:

Transcreation with Alessandra Martelli on February 4, 2017 in Leiden and

no kidding, the regex workshop on April Fool's Day 2017 with my favorite tech guru, the brilliant but articulate Marek Pawelec, a first-rate teacher who can make even nasty stuff like regular expressions seem simple for the rest of us. And as I have pointed out in various articles, this knowledge can be extremely useful for those who work with tools like SDL Trados Studio, memoQ, Xbench and more.

I encourage you to have a look at the ART project site and see what else is on the menu; it seems to me that they have the right approach for those looking for a good start in interesting new areas.

And keep up to date with them on Twitter....