Oct 16, 2013

Small caveats for memoQ fuzzy term matching

In the months since it was introduced this year, the terminology fuzzy match feature of memoQ has proved to be a great help in my work. The authors of the German texts I translate are sometimes particularly challenged with respect to spelling, and I might find the same source word spelled five or six different ways in a text, with some or all of the variations repeated frequently: Scheidungsurteil, Schaidungsurteil, Scheidungurteil, Scheidungs-Urteil, Scheidung Urteil and so on. It can be a real nuisance trying to keep terms in the translated text consistent when the source text is out of control this way, and often I've make frustrated searches for a term I know I put in the termbase, only to find that it was spelled a little differently. And then some of the changes between plural and singular forms could contribute to the difficulties of consistency, particularly in large texts with the terms thinly sown.

For cases such as these, the fuzzy term matches have been enormously helpful. I no longer have to make a catalog of crappy spelling and enjoy my bit of Schadenfreude as I share the hard-won terminology with the client in a pretty PDF dictionary that proudly displays all the misspelled variants in the source mapped to a single clean target term. Now I can maintain cleaner termbases that may actually be useful for reversed application (with German as the target language, not loaded with garbage spelling just to catch the matches with German as the source language).

But there's a dark side to this too. In some cases, I am misled by how fuzzy term matches are highlighted. An example of this can be seen here:

Look at the highlighted fuzzy term match in segment 3. The prefix un- is a negative, so in this case we're talking about fake raccoon skin underwear. Not the real thing. The problem is compounded in the QA term check:

The translation in this case actually correct, but it is flagged as an error because of the fuzzy match. I'm not actually sure there is anything to be done about this except perhaps identify troublesome cases like this and change the term entries to custom with appropriate wildcards or some other setting less likely to report an false error or overlook a real one. However, I think it might be a help for visual checking if the blue highlighting for fuzzy matches could be set not to extend to prefixes or portions at the end which go beyond the length of the term entry. Of course I do not know what the implications of this may be for other languages, so changes of any kind require careful thought.

Right after I posted this, I was contacted by a friend who had the same frustration with the misleading matches with some financial terms. This translator said that even adding the correct term for the mismatch did not correct the problem, and the proper match would not be displayed. That sounded very strange to me, so I had to have a look. I added the "fake raccoon underwear":

But my translation results pane showed both matches. What really bothered me, however, was that the worse match still took precedence for insertion as the tool tip indicates:

Oops. This doesn't change my positive opinion of the fuzzy matching for terms. It's still extremely helpful and overall helps me maintain better term consistency. But there are some things in the current version (6.5.15) which need a little tuning - like this goofy precedence problem - but even after any bugs are fixed there will still be a few inherent risks of which one may need to be aware and for which some particular QA strategies may need to be considered.

1 comment:

  1. There are tons of examples for this (e. g. effizient/ineffizient, motivieren/demotivieren) etc.pp. You might play around with the settings for this special entries or with a list of "forbidden terms". But this is a manual and complicated workaround. The best way would indeed be an (optional) feature with a language-specific list of prefixes (for German "un-", "de-", "ent-" etc.) and/or suffixes that automatically deactivate the fuzzy feature for terms.


Notice to spammers: your locations are being traced and fed to the recreational target list for my new line of chemical weapon drones :-)