Jun 24, 2011

My first look the new custom tagger in memoQ 5.0

Many months ago while I was doing some localization updates for the Online Translation Manager (OTM) from, the project's editor asked me if there wasn't some easy way to protect the many placeholders used for standard customer correspondence and other parts of the application. These typically looked something like [% variable %], where in the case of a variable for a company name, the placeholder might be [% COMPANY_NAME %]. In this case, during translation, care had to be taken not to omit the spaces around the variable name, mistype it or accidentally edit the characters. This usually meant copying the source to target as a precaution, but this approach has some disadvantages in efficiency, as does copying placeholders from the source to insert into a fuzzy match.

When I asked the support team at Kilgray if there was some way in memoQ to protect these placeholders, Gábor Ugray, the head of development, told me "not now" but that a solution would be at hand with the release of memoQ 5.0 and its custom tagger. I passed on that bit of news and promptly forgot it.

More recently I had an irritating small translation with a lot of markup like [B]for boldface type[/B], [U]for underline[/U] and so on. The markup played havoc with the spellchecker and was generally a nuisance. Only a few hours after I sent the finished job to my customer, I saw the solution to my problem in the introductory webinar for memoQ 5.0. "Cascading" filters and the custom tagger using regular expressions.

A few days later I had my first opportunity to try the technology myself. By then I had forgotten the work sequence from the demo and tried an approach which had not yet been fully debugged (but now works perfectly in the current build), but some generous hints and good application examples from the developers soon put me on the right track.

I imported the files like I did earlier using the Microsoft Word filter. Then I opened a file which contained the tags that concerned me and selected the command from the Format menu to run the regular expressions tagger:

In the dialog that appeared, I tested expressions for the bracketed content I wanted to convert to tags and viewed the results (saving my configuration for future use once I had what I wanted):

When I ran the tagger, the text in the working area then appeared with the markup protected as tags:

I then made a view with the rest of the files in the project and ran the tagger configuration I had saved so that all the files were properly tagged. I should have made a view of everything in the first place and tested the tagger with it, but I only thought of this later.

Pretty slick. I don't encounter this sort of challenge every day, but it comes up about once a month or more in some job, and this will make those projects much easier. Once a custom tagging filter has been configured, it can be chained ("cascaded") with other filters to form exactly the configuration you need for your file import.

Addendum / June 27, 2011: Other users' reaction to this technology:

1 comment:

  1. Just took advantage of this to protect the Memsource tags in a mxliff file I'm translating with memoQ instead of the impossibly slow Memsource. Pretty slick!


Notice to spammers: your locations are being traced and fed to the recreational target list for my new line of chemical weapon drones :-)