Mar 22, 2011

Sorting out errors with memoQ

Actually this technique applies to other tools as well, and I've used in in the past with Atril's Déjà Vu X. But combined with memoQ's other sophisticated filtering capabilities, it just seems to have a little extra punch with mQ.

What am I talking about? Many (nearly all??) of the TEnTs with column-based translation grids offer different ways to sort the data. The default order for the view is, of course, the natural order in which the segments occur in the parsed text (aside from a few oddities like header and footer info or footnotes getting stuffed at the beginning or end in some cases). Other sort orders, such as alphabetical ones, offer a fresh view of the data and can help you catch errors that might otherwise be overlooked in masses of data.

A current project I am praying to finish soon is a case in point. The text itself is mostly interesting, and the formatting in the PPTX files is clean, so I have no technical problems to burden me. However, the thousands upon thousands of lines are riddled with the pox of promiscuous, arcane acronyms. To add to the fun, it's a cooperative project with some great colleagues who use a not-so-great tool (Tradoze), and there is trash in the TMX files we exchange that just won't go away, even though they have long since adopted other target texts for the offending bits. (Yes, I know this is all a matter of tweaking the settings in the TWB TM, but tell that to the average SDL customer. It's easier to follow best practice with a tool that doesn't drive you to drink other than the occasional celebratory shot of Unicum when a job goes better than expected.)

Here's a typical view from my alphabetical "QA sort":

I've highlighted the drop-down sort command box with a red arrow. Here you can clearly see some of the errors that were buried in about 1000 segments of confirmed matches. Whoops! I missed those when scrolling through in natural order, because I was simply too fatigued by the length of the text. An alphabetical sort, perhaps combined with other filtering options, gives a necessary fresh perspective.

This is just one of zillions of little "tricks" available to users of translation environment tools; sort of the electronic equivalent of reading your work upside-down like some people do with hardcopy, but it's much less of a nuisance. Altogether I identified about 20 minor errors with those little acronyms, all caused by matching trash in shared TM data. Despite the fact I had already proofread in memoQ and in PowerPoint. So I am very, very grateful for technological options that help us go beyond the natural human limits we all share.


  1. Indeed, it is features such as these that make CAT tools indispensable even for not very repetitive text.

    A feature I love for example, is Studio 2009 filter features - which permits you to see at a glance only the segments that contain (or that do not contain, if you want - the regex filter is very flexible) some word - helps a lot in keeping things consistent.

  2. Exactly, Riccardo. As I mentioned, most modern tools do this stuff, and even SDL has caught up in some respects ;-) See those two little boxes at the top of the screen shot labeled "Target:" and "Source:"? Type a word, substring or quoted group of words in one of those and you can filter whatever view you have to contain only those segments with that sequence. These filtering options can, of course, be done sequentially. I remember discovering a similar feature in DVX years ago and was annoyed with myself for noticing it only after several years of use. That feature became a QA workhorse for me.

    It's really rather sad that so many people fixate on ignorant clichés like these tools only being good for "repetitive text". Their superior capabilities for review, revision, editing, proofreading or whatever the heck you want to call such QA tasks make them well worth the investment. We could probably both polish off a keg of cider and use up half a ream of paper listing the benefits of a good tool (from any vendor) and still not even get around to writing "repetition" on that list.


Notice to spammers: your locations are being traced and fed to the recreational target list for my new line of chemical weapon drones :-)