Jul 24, 2013

What good is memoQ fuzzy term matching?


When Kilgray introduced fuzzy term matching with the release of memoQ 2013, I was first concerned with how it worked after a few puzzling tests of the feature. Discussions with the development team soon cleared up that mystery, and I wrote an article describing the current fuzzy state of term matching technology in the translation environment tool that has done such a fine job of waking SDL and others from the long slumber of innovation that prevailed in the last decade.

But questions still remained in the minds of most users as they asked why they should care about this feature and what good it would really do for them.

The answer to that has become clearer for me as I have used the feature in recent weeks and noticed certain things. Like the fact that crappy spelling in my source texts is not as much of a burden for term matching any more:


This actually applies to more than just bad spelling. Those who translate from English will benefit from the fact that fuzzy term matching will help them if the UK source term is in the glossary but the author of the text used an American spelling. I cope with problems caused by old and new spelling conventions in German as well as the fact that a great many Germans cannot agree on how their compound words should be glued together. And my Portuguese friends tell me every week about the hassles of the spelling reform in progress in that linguistic corner.

Fuzzy term matches is currently not implemented for QA checking in memoQ, but I think it would make sense for Kilgray to add this feature to allow fuzzy term matching for QA on the source side. It could be a bit of a disaster to have it on the target side, however, for reasons I will leave readers to guess.

For those who want to set their termbases to use fuzzy matching by default in a particular language, here is a short video that shows how to change the termbase properties and how to change to match settings for legacy terms to "fuzzy":


I was initially a bit skeptical of the latest version of memoQ, but as this feature and a few others have begun to "sink in", while I still don't feel comfortable with the company's hyperbole over new features like LQA, which is largely pointless for freelance translators, I do feel confident in saying that fuzzy term matching is a reason for most of us to seriously consider upgrading to memoQ 2013. This will be even more the case if it is added to the QA features.

Ah, but what about the change to the comments function, Kevin? You really hated that!

There's more to say on that topic now. Some of it is even good.

5 comments:

  1. Interesting. I've been wondering about the idea with this feature myself. So what would be the reason not to set all the TBs with fuzzy matching?

    ReplyDelete
  2. Hmm, so I tried the tip in your second video, to change all my old termbase entries to ‘fuzzy’ in one fell swoop, but it didn't work.

    Last night, I selected all the entries in my main TB and then clicked on Fuzzy and, lo and behold, this morning memoQ is still churning away (‘Not responding’). Suppose I'll have to kill it and try again. I will report back later.

    Michael

    ReplyDelete
  3. @Michael: What was the size of the termbase you were trying to do this with? The ones I have tested so far had a few hundred to a few thousand terms in them, but knowing your addiction to "big data" I suspect I'm off your scale by some orders of magnitude. Although memoQ handles some (not all) data operations faster than DVX or SDL Trados, when I am messing with very large data quantities (like XLIFF files with 100,000+ segments), I am able to choke the program.

    @Anette: Sometimes there might be good reason for exact matching of term, especially if there are similar terms with which it might be confused. Also, the compound word feature is currently still only active for German, so I can imagine cases (and I have them in German as well), where the 50% prefix setting might work better for an individual term. QA may also be an issue if fuzzy term matching is ever added to that - it could be disastrous if the target text check became tolerant of typos or bad spelling.

    ReplyDelete
  4. Hmm. I just posted a comment on my iPad but it seems to have disappeared. Probably not a good idea to post from inside the Pocket app on the iPad.

    Anyway, yes, my TB is rather large. It weighs in at around 700,000 entries. However, since it didn't actually crash memoQ I suppose I might as well just try to be patient and wait.

    Michael

    ReplyDelete
    Replies
    1. 700,000 entries? I can't conceive of how that would even be remotely useful, as it would tend to drown what I need in "noise". The technical and legal glossaries I began 13 years ago in Trados and Déjà Vu probably have around 50,000 entries (I haven't bothered to check the count in few years), and that is really too much. These days my most valuable terminologies a QA tools with no more than a few hundred entries typically. Ah well, whatever I think of the mass data approach for getting work done, it is useful for stress testing. I remember years ago arguing with Kilgray developers that TM performance needed to be upgraded to handle my personal compendium of some 300,000 segments, back when they could not believe anyone would work with more than 50,000 segments in a TM. Now as you know they can swallow and use those 2 million TU TMX files from the EU DGT, though memoQ still gets indigestion if it sees that data in other formats.

      If you can afford the downtime to wait for the term editing operation to complete, I am very curious how long it actually takes. This may turn out to be an experiment like that tar drop that finally fell after a great-grandfather's lifetime of waiting.

      Delete

Notice to spammers: your locations are being traced and fed to the recreational target list for my new line of chemical weapon drones :-)