Mar 22, 2011

Celebrating diversity with memoQ

In discussions of machine translation, the concept of "controlled language" is mentioned often. I always get a laugh out of that. The reality of human language is that, even in the best cases, it's not nearly as under control as we might wish to believe. Take terminology, for example. Today's 3am-let's-have-another-triple-eggnog-latté challenge is a customer text on cable support systems that uses three different spellings of the German word for "cable support system", none of which of course were in the corporate terminology provided by the customer.

Being a memoQ user, I see this as an opportunity, not a problem :-)

Several functions for look-ups and editing in memoQ's integrated terminology module make it easy to:
         (1) find a term embedded in another term;
         (2) add the "new" term; and
         (3) map spelling variants in the source language correctly using target language lookups.

The third point is useful in helping clients identify areas in which their writers might need to exercise a little "control" or their authoring system's QA functions might need some tuning.

Finding terms embedded in other terms

Here the option to search for an entry as a substring in other terms enables me to maintain consistency in similar terms. The term "Tragsystem" is embedded in "Kabeltragsystem", which the customer has as "cable support system" in the corporate terminology. So I will use "support system" for the German "Tragsystem" as opposed to the less appropriate alternatives one might glean from the current Leo record ("load bearing system", "structural system"). Unfortunately I can't add a completely new term directly from this dialog in memoQ in the current version (4.5.68), though I can edit the match entries found.

Adding a new term entry

To add a new term entry, simply select the (source) term and use the corresponding keyboard shortcut or toolbar icon. If you have selected both source and target terms, both will appear in the entry dialog (shown above - I selected only the source term in this case). Enter the translation of the term and any meta data you want and click OK to confirm.

Here be source term spelling variants!

In the lookup example above, we saw that the entry for "cable support system" in the customer's term list was "Kabeltragsystem". However, to write is human, so of course just a few lines into the text the author writes "Kabeltrag-System". And for good measure a sentence later, it's "Kabelträgersystem". I expect tomorrow it will be "Kabel-Trägersystem" or even the completely impermissible "Kabel Trägersystem" or the all-too-inevitable "Trägersystem Kabel", not to mention it's close cousin "Tragsystem Kabel". Oh yes... and "Kabel-Tragsystem", "Kabelträger-System", "kabeltragendes System". Not being a native speaker of German, merely a willing victim thereof, I've probably missed a few impossibilities here. But the point is that good writing in a technical text requires a certain level of consistency, and those of us using pattern-matching term modules integrated in our translation environment tools need to capture the variants and map them to a desired term or two in the target language. A good tool will let us do this and record all the myriad source deviations for the client to consider - if the client wants to tighten up the in-house editorial work.

Mapping source term spelling variants

To record the diversity of source variants, I type the translation (in this case "cable support system"), then use the memoQ term lookup function and specify a target term lookup. In the dialog above I have entered two more variants encountered (using the "+" icon on the source term side) and marked the selected one as "forbidden" in anticipation of the termbase possibly being used for translation into German some day.

Sharing the wealth of information

When I'm done with the project (or before), I can filter the term data based on entry date and other criteria if desired, then export it to MultiTerm or CSV format. In the latter, multiple terms in the source or target language appear as additional columns populated on the same row. This makes it easy to identify synonyms for discussion purposes. Of course if these data are intended for use and important metadata like the "forbidden" status are desired, these can be included in the export.

1 comment:

  1. Great post, Kevin. Very clearly explained. It made me smile too as I encounter this sort of "diversity" all the time in technical texts.


Notice to spammers: your locations are being traced and fed to the recreational target list for my new line of chemical weapon drones :-)