Translation Tribulations: language variants

May 30, 2022

Cleaning up language variants in memoQ term bases

While the idea of using sublanguage variants, such as UK, US or Canadian versions of English, sounds nice in principle, in practice these often create headaches for users of translation environments such as memoQ, particularly when exchanging glossaries with others but also when viewing and editing the data in the built-in editors. Many times I have heard colleagues and clients express a wish to "go back" and work only with generic variants of a language in order to simplify their management of terminology data. In the video below, I share one method to do so.

At 3:08 in the video, I share a little "aside" about how the exported term data can be edited to mark a term as forbidden (for instance, if its use is not desired by the translation buyer). Other changes to the information are also possible at this stage, such as the addition of context and use information for example. Other data fields from the term base can also be included in the export for cleanup if these play an important role in your memoQ term bases.

For years, users have requested an editing feature in memoQ that would make "unifying" language variants possible, but as you can see in this video tutorial, this possibility already exists and is neither difficult nor time-consuming to implement.

If you do not wish to create a new term base to import the cleaned-up data (as shown in the video) but would rather bring it in to the same term base, it is important to configure the settings for your import correctly so that the original data will be overwritten and you won't end up with messy duplication of information. This is achieved with the following setting marked in red:

However, it should be noted that the term base will still have all the now-unused language variants, albeit with no entries for them. These can be removed by unchecking the boxes for the respective language variants in the term base's Properties dialog.

Speaking of the Properties dialog, some may have noted that in recent versions of memoQ there is an automated option for cleaning up those unwanted language variants:

Why bother with the XSLX route then? Well, depending on what version of memoQ you use, you may not have that command available in the dialog. But more importantly, I find that when merging data from various language variants I often want to do additional editing of the term information, and that really isn't possible when merging language variants in the Properties dialog. Doing the edits in Microsoft Excel gives you an overview of the data and the option to make whatever adjustments may be needed. In Excel you can also make further changes, such as altering the match properties for better hit results or more accurate quality assurance.

Sep 16, 2015

Getting around language variant issues in memoQ LiveDocs

I was told by some other users that a fundamental change had been made in the way language data are accessed in LiveDocs. It was said that until a few versions ago it had been possible to use documents for reference in LiveDocs regardless of their sublanguage settings. So I was told. The truth is more complicated than that.

According to my tests, memoQ 2015 is the first version of memoQ to have a logically consistent treatment of language variants for both bilingual and monolingual documents in corpora. All the other versions tested (memoQ 2013R2, 2014, 2014R2) are equally screwed up and show the same results.

The "visibility" of a monolingual or bilingual document when viewed in a corpus attached to a project running under memoQ 2015 follows these rules:

the sublanguage (language variant) settings for source and target (of the document or the project) must match the project
or the language setting (of the document or the project) must be generic.

Two rules. Pretty simple. It doesn't matter what version of memoQ the project or corpus was created in, only which version is actively running.

I created a test corpus with the following document mix:

The corpus contained 11 documents, both bilingual and monolingual with a mix of generic language settings and settings with language variants specified (such as German for Germany, Switzerland and Liechtenstein and English for Zimbabwe, the US and UK).

In a project running under memoQ 2015 with the languages set to generic German and generic English, all 11 documents in the corpus were accessible.

So if you want access to all LiveDocs corpus data for the major languages of your project, it is necessary to use generic language settings, either when you load the data into LiveDocs (difficult unless you always use the resource console, since adding documents to a corpus from within a project automatically applies the project's language settings!) or in the languages specified for the project itself. And this will only work with memoQ 2015. If you want to apply penalties to particular language variants this can be done using keyword markers (as seen in the screenshot above) and configuring the More penalties tab of the LiveDocs settings file applied to that corpus.

If the same corpus is attached to a project running under memoQ 2015 with language settings for Swiss German and generic English, the documents available from the corpus are these:

For a Swiss German and UK English project under memoQ 2015, this is the picture:

And for a Germany's German and US English:

All the screenshots above can be predicted based on the two rules stated. Work it out.

"But what happens with earlier versions of memoQ?" you might wonder. It's messy. Here is a look at a Swiss German and UK English project under memoQ 2013 R2, 2014 and 2014 R2:

And here's a project with generic German and Generic English under memoQ 2013 R2, 2014 and 2014 R2:

In each case the five bilingual documents are visible no matter what the project's language settings are. However, there is strict adherence to language variants and the generic language setting for monolingual documents! In my opinion, that's for the birds. I see no good reason to follow a different rule for data availability in bilingual versus monolingual documents. So in a sense, Kilgray has cleaned up this inconsistency in the latest version of memoQ.

Some have expressed a desire for a "switch" setting to allow language variant settings to be ignored. And perhaps Kilgray will provide such a feature in the future. But the best way to get there now is simply to make your project's language settings generic.

Changing the language settings for bilingual data in an existing LiveDocs corpus

If you have a corpus with a mix of language settings and you want to convert these to generic settings or a particular variant, this can be done as follows currently only for bilingual documents:

Select the bilingual documents to export from the corpus and export them to a folder. (If you choose to zip them all together, unpack the *.zip file later to make a folder of the exported *.mqxlz files.
Re-import the *.mqxlz files to the LiveDocs corpus via the Resource Console so you are able to specify the exact language settings you want. In the import dialog, you'll have to change the filter setting manually from "binary" to "XLIFF". These *.mqxlz files are not the same as bilingual files from a translation document in a project and are not recognized automatically.

Unfortunately, there is no way to change the language settings of a monolingual document except to re-import it in the Resource Console in its original form and set the language variant (or generic value) there.

So really, for now, the best way to go seems to be to use memoQ 2015 with generic project language settings.

Search me!

May 30, 2022

Cleaning up language variants in memoQ term bases

Sep 16, 2015

Getting around language variant issues in memoQ LiveDocs