Dec 22, 2013

My favorite library: Project Gutenberg

I've been fond of Project Gutenberg since I first became aware of it long, long ago. However, since acquiring an e-book reader I have become especially appreciative of this resource. Time and again it has had exactly what I'm looking for in classic literature, and the portable library I've built with it has been a fine companion in my travels and on long nights when I need good words to send me to sleep.

The recent discovery of Andrew Lang's fairy books has been quite an interesting thing, and when my Portuguese teacher recently recommended José Maria Eça de Queirós as a good author to familiarize myself with dialog, I was delighted to find many of his works available there in the original language.

Why not have a browse yourself if you haven't been to the site for a while, and if you get much out of it as I do, consider making a donation to support this good work.

Dec 12, 2013

In HAMPsTr We Trust?

So many times when I hear the bright and happy predictions of commercial interests spouting nonsense about "translation as a utility" and hoping to feast on the roadkill of communication, who claim the highest of motives and show the basest motivations in their real acts, I hear a saxophone in my mind and a strained voice declaring that some day "they may understand our rage".

Machine pseudo-translation (MpT) and human-assisted machine pseudo-translation (HAMPsTr) are big business for the profiteers offering pseudo-solutions which typically start in the low six figures of investment. "Get on the MT boat or drown!" declared one such profiteer, Asia Online CEO Dion Wiggins at his unfortunate keynote presentation at memoQfest 2012 in Budapest.

It seems that each week a new story line to justify the linguistic lemmings' rush over the cliff appears. Recently I heard for the first time how translators suffer from the "blank page syndrome" (note: as of 25 December 2013 the entire blog with that "blank page" link has disappeared) and need machine generated babble for inspiration. I thought perhaps I was just an odd one, usually struggling with many ways to render a text from German into my native language and trying to choose the best, but experienced colleagues I asked about their fear of blank pages all asked me if I was joking.

This morning another colleague sent me a real screamer:
"Smaller language service providers (LSPs) process fewer words than larger ones... [this] puts them at a disadvantage when it comes to leveraging linguistic assets due to the smaller size of their terminology databases and translation memories (TMs). These less comprehensive language resources limit reuse on subsequent projects or for training statistical machine translation (SMT) software."
The author of that particular bucket of bilge is Don DePalma, head of the Common (Non)sense Advisory, an organization rightly seen as incompetent to interpret even third-grade level mathematics in their discredited report of dramatic rate decreases for translations, which turned out to be an artifact of calculations involving mismatched survey populations. In any case, the idea that small translation agencies or individual translators, who are generally more aware of and concerned with their clients' business are at any disadvantage by not being buried under mountains of monkeyfied mumbo-jumbo from bulk trashlation nearly ruined my keyboard as I spit my coffee laughing. Don deserves an extra Christmas bonus for that transcreation of the truth.

But the best was yet to come:

This inspiring graphic accompanied an article on how to motivate those involved in post-editing MpT in the HAMPsTr process promoted by Asia Online and others. There has been some vigorous and interesting speculation on where that arrow is pointing :-)  The colleague who sent the link to me commented:
An interesting read from a humanitarian perspective. If they need to go to these lengths to "motivate" people, even those who are otherwise happy to swim in the muddy, toxic pond that these LSPs (your definition of the term) have created, one would have thought that they will understand that there is something wrong with their concept and goals. But why let the facts get in the way, I guess.
Indeed, those swimming in the pond do seem to have some real issues, even in cases frequently quoted as a HAMPsTr success. I long ago lost count of how many MpT advocates have told me of the wonderful words at Microsoft and Symantec, nicely extruded from controlled language sources and lovingly shaped into their final sausage form by happy hamsters. But this TAUS presentation by a Symantec insider tells another story:

And further indications that we are all getting mooned by the MpT Emperor can be heard in the excerpts of this recent GALA presentation in Berlin:

Unlike some of my colleagues, I have no fear of being replaced by Mr. Gurgle or any of his online Asian cousins however well-trained. What provokes some rage in me and more than a little concern is the callous dishonesty of the MpT profiteers and their transparent contempt for truth, the true interests of modern business and the health of those involved in language processes.

I have no little sympathy for the many businesses and individuals struggling to cope with the challenging changes in international business communication in the past 20 years. Nor do I feel that MpT has no role to play in communication processes; colleagues such as Steve Vitek have presented clear cases of value for screening of bulk information in legal discovery to identify documents which may need timely human translation and other applications. Kirti Vashee of Asia Online has commented honestly on numerous occasions on his blog and elsewhere about the functional train wreck of most "automated translation" processes one encounters, but still cannot take proper distance from the distortion and scaremongering practiced by the head of his team and others.

I am particularly concerned by the continued avoidance of the very real psychological dangers of post-editing MpT, which were discussed by Bevan and others in the decades before the lust for quick profits silenced discussions and research into appropriate occupational health measures. If Asia Online and others are truly concerned with developing sustainable HAMPsTr processes, then let them fund graduate research in psychology to understand how to protect the language skills and mental function of those routinely exposed to toxic machine language.

All this disregard for true value and truth reminds me so much of my days as an insider in the Y2K programmers' profit orgy: we all knew it was bullshit, but all the old COBOL programmers wanted to take their last chance to score big before they were swept into the dustbin of history. Some 60 years or so after it began, is machine translation ready to assume its place in that bin? The True Believers and profiteers will loudly say no, but at some point the dust will settle, the damage will be assessed, and we will find that the place of MpT is not at all what many imagine it to be today.

Dec 11, 2013

General settings for memoQ TMs

memoQ TM settings are found in the Resource Console, the Options and a project's Settings.
This is a very useful "light resource" which is well worth nearly every user's time.
To define the TM settings to be used in new projects, select a settings configuration under Tools > Options... >  Default resources > TM settings (in the row of icons) by marking its checkbox.

To define the default TM settings to be used in the project you have opened, go to Project home > Settings > TM settings (in the row of icons) and mark the checkbox for the desired project default.

Different settings for individual TMs in a project (for example to set higher or lower match criteria) may be applied by going to Project home > Translation memories, selecting the TM of interest, clicking the Settings command at the right of the window and choosing the settings to apply instead of the project's standard TM settings.

The General settings tab is the same for all currently supported versions of memoQ. Role options are included on another tab in memoQ 2013 R2, and the Project Manager editions of memoQ offer additional possibilities for filtering and/or applying penalties to content on a Filters tab.

Match thresholds
The first value here (minimum) controls the fuzzy percentage below which a match will not be displayed in the translation results pane at the upper right of the working translation window.

The "good match" threshold is relevant to pretranslation (though this is unfortunately not made obvious in the dialog). The default value of 95% is really too high and would only apply to matches with small differences in tags or numbers; since any small difference in words is penalized significantly in memoQ (something I find very helpful, as I can understand more quickly what differences to look for compared to working in Trados). I usually set my "good matches" to 80%.

Not a "good match" according to the memoQ TM default setting
In my work, an alignment penalty, which is a deduction from the match rate of a translation unit created by feeding an alignment to a translation memory, does not make a lot of sense. This is because
  • I almost never send alignments to a TM. Why bother? LiveDocs may be slower in pretranslation, but it provides context matching just like a TM, and you can actually read what you find in a concordance search in its original document context. TMs suck because you do not get the full context for your matching segment and are thus at greater risk for missing information which may be important for a translation. This is especially the case with short match segments.
  • if I happen to be aligning a dodgy translation and want to send it to a TM, I'll put it in a "quarantine TM" which already has its own penalty.
  • on those rare occasions when I might feed an alignment to a TM, it's because the content is going to a user of another CAT tool, and if that person uses Trados or another tool that can read XLIFF files or other available bilingual formats, I'll send the data as that instad, so it can be reviewed and modified more easily before feeding to a TM. This also gives the other person a bilingual reference with document context.
  • alignment for TMs is soooooo 1990s!
User penalties: If you have the misfortune to share a TM with someone whose work you do not trust completely and you want to avoid letting that person's 100% and context match segments slip past you unnoticed, apply a suitable penalty for the level of "risk" that person represents. If you want to be sure that user's content never gets used in a pretranslation and never appears in the translation results pane, apply a whopping big penalty like 80%. Those segments not be shown or inserted but will still be there in a concordance search if you want them.

TM penalties: Sometimes a client provides you with a TM you do not trust completely, or you may have a "quarantine TM" with content of dubious quality. Or I might have a TM with good content in British English but need to deliver a translation in American English. Applying penalties to such TMs will reduce the priority of their matches and prevent 100% matches with inappropriate language from slipping past without more careful inspection. As in the case of user penalties, you can also apply a very large penalty to ensure that matches will never be displayed in the translation results pane or used in a pretranslation but still have the TM content available for concordance searches.

It seems to be a good idea generally to enable the adjustment of fuzzy hits and inline tags. In many (but not all) cases, this will correct small differences in numbers, punctuation, cases and inline tags.

The only significant effect I was able to determine in adjusting the inline tag strictness in my tests was that more permissive settings might count a match with different tags as a full match. While this might meet the requirements of some clients hoping to impose discount schemes, from a quality assurance perspective, this does not seem like a good idea, and I believe it is better to have a strict setting here to draw attention to differences and reduce the chance that errors might be overlooked.

Dec 8, 2013

memoQ TM settings: beware the Kilgray defaults!

memoQ 2013 R2 introduced a very significant change in the management of translation memory data which most users are likely not aware of. However, because the default behavior for information storage in translation memories was changed, it is important to be aware of this difference and what to do before your data are unacceptably compromised.

The screenshot above shows several different translations stored in my TM for the sentence in the second segment. In previous versions of memoQ, only one translation would be stored with the way this translation memory was configured. However, in memoQ 2013 R2, the role of the person editing the translation becomes an important part of the "context", and as a result, multiple translations can be stored for different roles. Personally, I find this a rather useless feature, because if I want to know previous translations for a segment, I consult the row history using the context menu. But I understand how in some processes, it may be desirable to maintain a record of translations entered by the translator and the first and second reviewer.

I have no use for these older translations, especially as these may contain errors (as seen in the example of the third entry in the screenshot). If I am proofreading my translation in a "reviewer" role and make changes, I want to overwrite the original entry in my TM and avoid the chance that its errors will be propagated in later work.

To avoid the problems that can result from this redundancy and preservation of errors in the translation memory, as of build 6.8.6 it is necessary for users to explicitly opt out of the current Kilgray TM settings defaults and create their own custom settings.

TM settings are "light resources" which can be managed in four places:
  • The Resource Console,where settings can be created, edited, imported, exported, etc.
  • The Options (Tools > Options... > Default resources > TM settings), where the default for new projects can also be set
  • Project Settings (Project home > Settings > TM settings) in a specific project, where the default settings for the current project can be set
  • Project home > Translation memories > (TM) > Settings where alternative TM settings can be specified for a particular translation memory selected in  project. This would be the case there you want to apply a special set of penalties to the content of that TM, for example.
The last tab of the default TM settings dialog looks like this:

To avoid the trouble of multiple, role-based entries being written to a TM, settings must be created in which the option to Store modifying user's role in the TM entries in not selected, and these custom settings must be applied to the primary translation memory in the project (by default or explicit selection).

Here's the "fast path" for staying out of trouble:
  1. Go to Tools > Options > Default resources > TM settings and if you do not already have custom TM settings to edit, select and clone the default settings. Give them a suitable name like "My Own TM Settings".
  2. Click the Roles tab and unmark the setting to store the user's role in TM entries.
  3. Click OK.
  4. Ensure that the checkbox next to these custom settings is marked so they will be applied to all new projects. Then click OK to exit the options.
  5. In any currently open project to which the desired settings have not been applied, go to Project home > Settings > TM settings and select the desired settings as the default by marking the corresponding checkbox.
Multiple entries written to the TM when the roles are included will not be eliminated after the TM settings are corrected. They must be explicitly removed by editing the translation memory.

I hope that in the future Kilgray will reconsider these troublesome new default settings and make the new possibilities "opt-in" values in custom TM settings. But for now, users must actively change their settings and defaults if they want to avoid role-based additional TM entries. (The current version of the memoQ Help describes roles as being disabled here by default. Would that this were so!)

You can, of course, make other useful adjustments to your custom TM settings, such as defining what a "good" match is (for pre-translation) or adjusting the tag matching behavior or applying various kinds of penalties to reduce match values for content which might have quality problems. The memoQ Help offers guidance on these options.

Even after the settings are "fixed", "existing damage" in a TM caused by the storage of unwanted, role-based information is not repaired. Any messes will have to be cleaned up in the rather inadequate TM editor in memoQ or in an external TMX maintenance tool. At the present time, there is no "easy option" to clean up a large number of erroneous or redundant translations stored because of this role setting. This case unfortunately underscores the woefully inadequate maintenance facilities for translation memory resources in the current version of memoQ. Perhaps some of the sophisticated options developed for Kilgray's TM Repository will finally trickle down in some way in an integrated option with Language Terminal or some sophisticated filtering and editing options will be added directly to the desktop product so that users can finally maintain their TM data in a reasonable way. memoQ is, overall, the best option available to us for project work in most cases, and I recommend it to colleagues because I know they will be able to do most ordinary tasks with a minimum of grief and calls for help (or expressions of anger) directed to me. But in 2013 it is ridiculous that my ability to manage my TM in my tool of choice is inferior to what I could do when I started using Déjà Vu as my CAT tool 13 years ago. Please join me in encouraging Kilgray to raise their game - soon - with respect to translation memory maintenance by writing to and expressing your need for better data management! (And more sensible default TM settings, of course.)

It looks like this default problem may end with the 6.8.6 build. One of the key people involved with memoQ and its features has stated that "After the [next] update, the default TM settings resource will have 'Store modifying user’s name in TM' unticked." Excellent.

For cases where there may already be data problems from older, erroneous entries being retained, the following workaround was suggested:
  • Export to TMX
  • Start up 6.5 and import into an empty TM
In the process, memoQ 2013 (version 6.5) will ignore the role information in the TMX, and entries with the same source will not create duplicates; translations with a later timestamp will be preserved in the TM if there are duplicates in the TMX.

This still doesn't change the fact that we need better means of maintaining our data in memoQ, but it is good that once again, Kilgray has responded quickly to important concerns of its users and is on the way to solving the problem.

Dec 2, 2013

Segmentation in memoQ server projects

Segmentation difficulties are often one of the most troublesome aspects of working with translation environment tools. Learning to configure segmentation rules correctly and applying that knowledge can save many hours of wasted time in alignments and translations and avoid filling translation memory resources with garbage from fractured translations of partial sentences with missing verbs, subjects and whatnot.

The usual alternative remedy for inadequately configured segmentation rules which lack the segmentation exceptions needed for abbreviations, for example, is to use the "join" function (Ctrl+J), and sometimes the split function (to manage very long, unwieldy clauses such as one might find in a patent text, and the join the parts again later).

There are situations where joining and splitting of segments is blocked. This is the case with any file which is part of a view, for example; the view must be deleted before segments can once again be joined or split. Segmentation changes are also not possible in a server project which has not been set up to allow them.

There are several options or documents available to project managers when setting up  memoQ server project. But to enable translators to correct unfavorable segmentation, there is really only one choice:

If Desktop documents (no web translation) is selected, then on the dialog page which follows, changes in segmentation can be enabled:

If a project manager does not configure a project to allow this, for example because a document is being split between multiple translators (which does not allow for segmentation changes for technical reasons), the full responsibility must be assumed by the project manager for any segmentation issues. The imported documents should be examined carefully, and if any problems are observed, the segmentation rules should be modified and the documents re-imported. Doing otherwise may unavoidably result in garbage being written to the project's translation memory.

This is a very important point for memoQ trainers to emphasize when they are teaching users of the memoQ server to set up projects. Segmentation topics should be covered thoroughly, and the potential for bad results should be understood clearly if translators are given badly segmented documents they cannot fix. Project managers should also be encouraged to avoid restricting translators options in ways which are likely to harm the quality of the results and make parts of the translation unfit for later re-use.

A good rule of thumb is to choose the desktop documents option for projects always unless there are very urgent reasons not to do so. In this way, you will avoid upsetting your translators by forcing unmanageable, fractured sentence fragments on them, and you will be assured of better quality translation memory resources.

Gratefulness and respect

I have to thank my Romanian colleague Laurentiu Constantin for sharing a link to a TED lecture by the Benedictine monk David Steindl-Rast. I lost my taste for the TED material for the most part some time ago; most of it is rather lightweight cardboard wisdom or even crappy pseudoscience, and some may consider this old man's insights and his suggestion that it is not the happy who are grateful but rather the grateful who are happy to belong in the former category.

But as I walked my dogs back tonight from their evening routine in the park, I thought about the puzzle of some people I have known in my life who have such wonderful gifts and opportunities... and how much misery some of them create for themselves, their families and their wider circles. Misery is a popular subject with some in our circles of translation, and it's easy enough to identify persons, companies or technologies to stand in the roles of greater or lesser Satans, and the facts and arguments behind some of the finger pointing are sometimes clear and well-grounded.

If we stand firm-stanced and triumphant in the field, our dragons slain and scattered in pieces at our feet, will we be happy, and for how long? Are we grateful for a new opportunity or insight or are we more concerned with missed opportunities past, the prospect of failure or the cost of late understanding?

There was a passing comment from the old monk somewhere after the 12-minute mark in which he made an interesting contrast between equality and equality of respect. The former is difficult, sometimes impossible and probably even often undesirable to achieve. But the latter, I believe, is very much in the grasp of all, and it might go a long way toward the bad conditions for which we often call for some sort of "equality" as a remedy.

One of the greatest problems I see in current controversies in translation which involve crowdsourcing, machine pseudo-translation (MpT) and machine pseudo-translation post-editing, the creeping deprofessionalization and demonetization promoted by Translators Without Borders or similar programs and their well-paid corporate advocates et cetera is the lack of respect shown for individuals, their personal interests and health and their very necessary capacity for useful, sustainable contribution.

But equally troubling sometimes is how easy it is to forget the importance of respect and dehumanize those with whom we disagree. Stephen Fry explored the use of language as an enabler of otherwise unimaginable awfulness, and I think the historical record largely supports his analysis. So I think that those who would stigmatize persons with legitimate reservations about the viability and desirability of machine pseudo-translation in many of the fields where some would apply it by calling them "haters and naysayers" or comparing them to Tea Party political fanatics should take some more care with their choice of words in the service of their paymasters.

I think it is also good to remember that even some of the most damaging influences in our field are not devils, even if there does seem to be a lot of brimstone in their choice of deodorant sometimes. And if their devilish intents require appropriate actions in response, we should not deny ourselves the opportunity to look for some common ground to temper our actions or at least our pride in a successful action.

And if it all goes to shit, I hope that I can still find the quiet to appreciate and be grateful for a warm paw in my hand, the ache in the joint of my index finger which reminds me that I have a hand to grasp a cup, the light acid bite of juice passing over my tongue, the heat of my fire on a cold December night and a well-chosen word from my head or that of a friend.