Mar 17, 2016
Dynamic filtering with regular expressions in memoQ
Regular expressions (aka regex) are not a tool for everyone, though this is something that the nerdily inclined often fail to appreciate. For average users, a plain language query interface, perhaps with more limited options, is generally more accessible and used. However, sometimes it's nice to have such "shortcuts" available to select particular structures in a text for translation or editing, and the many people who complained for years that Kilgray did not provide a dynamic regex filter for the working translation grid - a feature of SDL Trados Studio for quite a while now - did have a point worth addressing in development. Now that has happened, though still a bit incompletely when considered in the full scope of memoQ's usual features for selecting text.
memoQ uses regex in a number of its modules, and Kilgray has several webinars which describe these applications, though they require some stamina to watch, and I expect that most people will become hopelessly confused if they try to take in more than one area of application in a single sitting. The uses of regex for segmentation rules, tagging, autotranslatables and text filtering on document import (with the Regex Text Filter) are very different in their approach, even though the underlying syntax of the regex is the same. However, all of these applications allow the configured rules to be saved and re-used, so one could ask an expert to create the settings needed and provide these in a resource file, and many users do exactly that. Thus as long as one understand that regex can be used for a particular problem, the details can be hired out.
This new application of regex for dynamically filtering, introduced in recent builds of memoQ 2015, is a little different (at present). Although the Find/Replace dialog will "remember" regex syntax in its dropdown menu of recent expressions, there is no way to store these expressions, and they must be entered manually to use them. This means that, for now, the average user will have to collect useful expressions like a tourist might scribble phrases in a notebook to use on holiday in a foreign country, and those with a little more sense of adventure might find themselves with a hovercraft full of eels and wonder why.
One such phrase might be the example in the screenshot above. I was translating some financial statements with several formats present for digits in account numbers, dates and monetary expressions. In order to work more systematically with these various formats, I used several different regex expressions to sort and separate them. In the example I was looking for instances where at least four digits were written together in a source segment. That isn't terribly selective, but most of these occurrences in my documents were account numbers, and this helpfully cleaned up the text a lot and allowed me to work a little faster. Other expressions were used to QA date formats and monetary expressions more specifically.
In the working grid for translation and editing, regular expressions can be used in one or both of the fields for the source and target text when the checkbox in the toolbar at the right is marked. Or the regular expressions option in the Find/Replace dialog can be used.
It is somewhat disappointing that regex cannot be used to create static views at the present time. While marking can be used in the Find dialog to enable one to go back and forth between the filter criteria and other configurations of the working grid, there is no way to make a permanent "record" of the filtered segments. For quite a few years, I have wished for the possibility to save the results of my filtering in the working grid in some sort of view, but I was always able at least to recreate the filtering criteria in the dialog to create a memoQ View, which could then be opened at any time or exported in various formats for clients and project collaborators. However, at the moment that is not possible with regex filtering. (There are workarounds involving a change in segment status, but these are often inconvenient in a project in progress.)
The addition of regex filtering to the working grid in memoQ is a welcome feature for many, which I hope will be expanded by Kilgray in the future to achieve more of its potential. But to take advantage of this potential in any way, the average user will indeed need a "phrase book" of sorts, and an efficient way of managing useful collected regex snippets (and naming them for easier re-use in searches and filtering) would be very desirable. If these "regex phrase books" for dynamic filtering and view creation were able to be saved as shareable light resources, it would be possible to build many useful collections to help users at all levels in the translation, editing and quality assurance tasks.