Pages

Dec 13, 2016

The irregularities of regular expressions in #memoQ


Sometime back in the time-distant swamps where memoQ evolved, regex mysteriously became part of the software's virtual genes. It was unclear, exactly, which third-party engine or bacterial life form had been its source, and solution developers were often at a loss to know which advanced syntax would work or not unless they tried (and very often failed).

Many of us begged and pleaded for some kind of definitive documentation of allowed syntax for memoQ's regular expressions, which are an important feature for filtering (in recent versions), segmentation rules, special text import filters, autotranslatables rules and probably a few other things I've forgotten. But begging, threats - even bribery - led to no useful reference information, just some useless suggestions to read beginner's tutorials for other dialects somewhere on the Web.

Then, quite by accident, I learned yesterday that Kilgray uses the engine in Microsoft's .NET framework. Doh. Who'da thunk? Now, at last, I can get some definitive syntax information to help me solve more sophisticated problems for legal reference formats and other challenges in my translations with memoQ.

Even with accurate syntax guidance (at last!!!), regex development with memoQ is often not a simple matter. The integrated editors are often useless, especially for things like complex autotranslatables, where the bad feature of changing the order of rules after an edit can kill a ruleset. (It was long claimed by Kilgray Support that rule order does not matter, which is patently untrue. They simply did not look at the right test cases.)

Good code of any kind should usually be documented to facilitate maintenance. This is simply not possible with the editors for regex integrated in memoQ. So instead, I do all my rule-writing work in an external editor (such as Notepad++), where I can add extensive <!-- comments so I know what the heck I did when I have to revise the rules later --> and import the rulesets for testing into a memoQ project with appropriate test data included as "translation" documents. The hardest part of this workflow is remembering to enable the imported ruleset I want to test under Project home>Settings>Auto-translation rules; often I forget and think I really screwed up until I go back to the settings and mark the checkbox by the rules to test. Keep a lot of carb sources at your desk when you do regex work. Your brain will need them.

A lot of memoQ users think that regex is irrelevant to their working lives, but for hardcore financial and legal translators at least, this is an entirely mistaken idea. Correctly constructed rules can save much time and a lot of frayed nerves dealing with citations, dates, currency expressions and more, and the rules also decrease QA time while increasing accuracy.

I have quite a number of custom rulesets I have put together for my work and for some colleagues and clients. Regex is hard shit, no matter what anyone tells you. I have programmed computers in a host of languages since 1970 more or less and used to be known for a good memory for syntax rules, but I find regex so non-intuitive at anything more than a very basic level that if I use it only a few times a year, I have to re-learn it nearly every time. That's no fun. So the key to mastering regex is not to learn it. The massahs usually don't know sheet about workin' the fields, but if they are going to survive in this competitive world, they'll know which specialist to put on the job and reward him or her appropriately. Get to know a competent consulting specialist for memoQ regex, like colleague Marek Pawelec, and let that person's expertise save you many hours of typing and QA, not to mention undetected errors.

Kilgray also established a Professional Services department at last not long ago, and that team can also help you with these and other problems for optimizing the use of translation technologies. This is very often a better option than using consultants primarily focused on SDL solutions who do a bit of memoQ on the side, because even the best of these are often not really aware of the best approaches to use, and the consequences of this are sometimes dire. Are they at the memoQ wordface nearly every day, dealing with a wide range of challenges that push the technical envelope of the software to its limits? Or would they really rather do a beginner's workshop for SDL Trados Studio 2017 and show you all the cool features that memoQ has had for years and they probably never learned very well anyway? If it's not the first case, caveat emptor no matter the source.

1 comment:

  1. A good free tool to use to build, test, and store regular expressions is Expresso (http://www.ultrapico.com/expresso.htm) - I believe that it is based on the .NET framework so it should work well for memoQ regex. An even better tool (commercial) is Regex Buddy (https://www.regexbuddy.com/), which allows building and testing regular expressions selecting which flavor of regex to use (so .NET for memoQ, POSIX for Xbench, etc.)

    ReplyDelete

Notice to spammers: your locations are being traced and fed to the recreational target list for my new line of chemical weapon drones :-)