Aug 24, 2016

memoQ autotranslatables: a partial antidote for drudgery

I'm currently working on a stack of legal pleadings for a patent nullity suit – lots of "urgent" words to churn by the end of the week. And after 10,000 or so of them, I got pretty damned tired of typing out the translation of text citations of the form "Spalte 7, Zeilen 34 bis 45" as "Column 7, Lines 34 to 45".

In fact, it was really starting to piss me off. In such situations, I try not to get mad but to get an autotranslatable ruleset instead. This is perhaps one of the most under-utilized productivity tools in memoQ.

So the next time I ran into a text that fit that format, the translation was offered as an autocompletable phrase as soon as I typed the first letter:

Of course life isn't usually that simple, at least not life with technology. And authors? Well, they seem to believe firmly in the old saying that "consistency is the hobgoblin of little minds". So of course the text also includes lots of references in the form "Spalte 7, Zeilen 34 - 45", with or without spaces around the hyphen. No problem, just add a rule for that (or if you are more clever, edit the single rule to cover the variations):

Now I am not one to advocate that the unwashed masses of translators – or even the washed ones – run out and learn to write regular expressions. I've programmed more computer languages and systems than I can possibly remember for about 45 years now, and I can't keep most of the autotranslatable rules in my head if I don't use them for a week or more after yet-another-refresher, so it would be stupid and hypocritical of me (or just bloody naive) to expect most people to mess with nerdy shit like this. But....

... a few simple rules and a couple of nice "recipe templates" to start can go a long way. And sometimes it pays not to be too clever; I have one highly sophisticated set of rules for complex legal citations that was written by a professional programmer, and it's unusable. Takes minutes to load even on a very fast computer, which is a huge pain in the backside every time a project is opened in memoQ. My more verbose, brute force approach to legal reference autotranslation may not be elegant, but it loads much faster and covers 90% or more of what I encounter. Maybe a case of where it's smart to be a little stupid.

There are lots of good tutorials out there on regex (regular expressions), including a few YouTube webinar videos from Kilgray, the memoQ Help, a few chapters in old books of mine, discussions in the Yahoogroups lists and more.

The examples above require the knowledge of only a few rules:
  • Chunks of the source text to be analyzed are grouped in parentheses. In the examples shown, those groups are merely where numbers occur.
  • Numbers are represented by the escape code "\d". If there might be more than one digit, add a plus sign: \d+.
  • Spaces are represented by the escape code "\s". In the rules you can usually just type a space instead, but if you have to cover cases where it might be missing or where more than one might have been typed (usual sloppiness), then use the escape code, followed by an asterisk, which means "zero or more" of whatever it is put after: \s*.
  • For the rest of the text to match, you can usually type it just the way it occurs as I have done above. For the target translation rules, you can usually just type the literal text you want, with the groups represents by the numerical order in which they occur, preceded by a dollar sign. So the first group (parentheses set) in the source is $1, the second is $2, etc. Of course the order can be changed in the target; it's just not necessary in this case, but in autotranslatable rules for dates this happens rather often.
Not only will the little rules I wrote for this big job save me a lot of typing, I can also use them in a QA profile to check that I have made no errors by switching numbers, missing a space or anything else in my translation. That is done by marking the appropriate checkbox on the first tab of the QA profile you plan to use:

Perhaps such things are worth a little effort in your projects once in a while....

Aug 23, 2016

Reminder: web search tutorial this Friday!

Time is running out to register for Michael Farrel's webinar this Friday on the basics of IntelliWebSearch, a scripting tool that runs under Windows and enables multiple, simultaneous web searches using text selected in any application.

I used to be rather sceptical of this sort of tool, but in the past several years (since a similar, less powerful feature was introduced in memoQ) I have found this to be among the greatest contributors to me research and translation productivity. This saves time and reduces my work fatigue over the course of a long day.

The online workshop is free to IAPTI members and very affordable to everyone else (USD 25 or a bit less if you are a member of a partner association.

There will be a more advanced presentation to follow in September, which does not require participation in this one, but which does assume that you know the basics of IWS.

Aug 6, 2016

Approaching memSource Cloud

It has been interesting to see the behavior of my codornizes since I moved them from the confines of a rabbit hutch in a stall at my old quinta to the fenced, outdoor enclosures in the shade of a Quercus suber grove. In the hutch, they were fearful creatures,panicking each time I opened their prison to give water and food or to collect eggs. Their diet was also rather miserable; the German hunters who first introduced me to these birds for training very authoritatively told me that they ate "only wheat", and I felt bold to offer them anything different like cracked corn or rice. In the concentration camp-like conditions in which they lived, they also developed a serious case of mites and lost a lot of feathers. I thought about slaughtering and eating them as an act of mercy.

Then last spring I moved to a new place with a friend, who built a large enclosure for my goats and chickens. She didn't know about the quail. I brought them one day and hastily improvised an enclosure for them with a large circle of wire fence around a tree, because I was afraid the goats might trample them. There was far more space in this area than they had before, and real, dry dirt for taking dust baths. Soon the mite infestations improved (even before regular dunks in pyrethrin solution began), and the behavior of the birds began to change. They became less nervous, though sometimes when someone approached the enclosure they flew straight up in panic as quail sometimes do and bloodied themselves on the wire.

A few months later I built a much larger enclosure for a mother hen and her chicks to keep them out from under trampling feet or from wandering through the chain link fence of the enclosure into the hungry mouths of six dogs who watched the birds most of the day like Trump fans with a case of beer and an NFL game on the TV. The quail were moved in with the chickens as an afterthought. With nine square meters of sheltered space, the three little birds underwent further transformations, becoming much calmer, never flying in panic and allowing themselves to be approached and picked up with relative ease. They also exhibited a taste for quite a variety of foods, including fresh fruit and weeds such as purslane. Most astonishing of all, they began to lay eggs regularly in an overturned flower pot with a bit of dried grass. Nowhere else. All the reading I've done on quail on the Internet tells me that quail are stupid birds who drop their eggs anywhere, do not maintain nests and seem to have no maternal instincts whatsoever. I am beginning to doubt all that.

At various times in my life I have heard many statements made about the cultural proclivities of various ethnic minorities, but these assertions usually fail to take into account historical background and circumstances of poverty and prejudice, choosing instead to blame victims. In cases where I have seen people of this background offered the same opportunities I take for granted or far less than my cultural privilege has afforded me, I cannot see any result which would offer itself for objective negative commentary.

There are a lot of ignorant assumptions and assertions made about the class of digital sharecroppers known as translators. Some of the most offensive ones are heard from the linguistic equivalents of plantation owners, some of whom have long years of caring for these hapless, technophobic, unreliable "autistics" who simply could not survive without the patriarchal hand of their agencies.

Fortunately, technology continues to evolve in ways which make it ever easier to take up the White Man's Burden and extract value from these finicky, "artistic" human translation resources. The best of breed in this sense could make old King Leopold II envious with the civilization they have brought to us savage translators.

On many occasions, I have advocated the use of various server-based or shared online solutions for coordinating translation work with others. And I will continue to do so wherever that makes sense to me. However, I have observed a number of persistent, dangerous assumptions and practices which reduce or even eliminate the value to be obtained from this approach. It's not a matter of the platform per se, usually, unless it is Across to bear, but too often over the past decade, I have seen how the acquisition of a translation memory management server such as memoQ or memSource or a project management tool such as Plunet, OTM or home-rolled solutions has led to a serious deterioration in the business practices of an enterprise as they put their faith more in technology and less in the people who remain as cogs in their business engines.

As the emphasis has shifted more and more to technologies remote to the sharecroppers actually working the fields of words, a naive belief has established itself as the firm faith of many otherwise rational persons. This is expressed in many ways –  sometimes as a pronouncement that browser-based tools are truly the future of translation, often in the dubious, self-serving utterances of bottom-feeding brokers and tool vendors who proclaim the primacy of machine pseudo-translation while hiding behind the fig leaf argument that we need such things to master the mass of data now being generated. It is fortunate for them perhaps that this leaf is opaque enough to hide their true linguistic and intellectual potency from public view.

A related error which I see too often is the failure to distinguish between the convenience of process and project managers and the optimum environment for translating professionals. I don't think this mistake is malicious or deliberately ignores the real factors for optimal work as a wordworker; it's simply damned hard much of the time to understand the needs of someone in a different role. I could say the same for translators not understanding the needs of project managers or even translation consumers, and in fact I often do.

So indeed, the best tool for a project manager or a corporate process coordinator might not be the best tool for the results these people desire from their translators. Fortunately, this is usually a situation where, with a little understanding and testing, both sides can win and work with what works best for them. The mechanism to achieve this is often referred to by the nerdy term "interoperability".

Riccardo Schiaffino, an Italian translator and team leader based in the US, recently published a few articles (trouble and memoQ interoperability) about memSource, a cloud-based tool whose popularity among translation agencies and corporate or public entities with large translation needs continues to grow. High-octane translators like Riccardo and others have trouble sometimes understanding why these parties would choose a tool with such great technical limitations compared to some market leaders like SDL or memoQ, but the simplicity of getting started and the convenience of infrastructure managed elsewhere on secure, high-performance servers with sufficient capacity available for peak use is an understandably powerful draw.

And the support team of memSource and the tools developers are noted for their competence and responsiveness, which is equal in weight to a fat basket full of sexy technical options.

So I will not argue against the use of memSource by agencies and organizational users whose technical needs are not particularly complex and who do not have concerns about a tool almost entirely dependent on reliable, high bandwidth internet connectivity at all times to fulfill its key promises. In fact, it's a good and easy place to start for many, perhaps more so than the rival memoQ Cloud at present, which suffers sometimes from limited capacities (at the same data center used by memSource and others!) during peak use. Unlike the barbed-wire, unstable and unfriendly solution Across, which has achieved some popularity in its native Germany and elsewhere through sales tactics relying on fear, uncertainty and doubt regarding illusionary or delusional data security, memSource works, works well, and the data are portable elsewhere if a company or individual makes another choice some day.

But damn... it's just not very efficient for professional work, especially not for those of us who have amassed considerable personal work resources and become habituated to other tools like SDL Trados Studio, Déja Vu or memoQ like a carpenter is to his time- and work-tested favorite tools. Trading one of these for the memSource desktop editor or, God forbid, the browser-based translation interface feels worse than being forced to do carpentry with cheap Chinese tools cast from dodgy pot metal. Riccardo mentions a few of the disadvantages, and I could fill pages with a catalog of others. But compared to some other primitive tools, it's not so bad, and for those with little or no good experience with leading translation environment tools, it may seem perfectly OK. You don't miss a myriad of filtering options to edit text or sophisticated QA features if you are still amazed that a "translation memory" can spit out a sentence you translated once-upon-a-time if something similar shows up six months later.

And as mentioned, memSource - or some other tool - may indeed be the best solution on the project management side. So what's a professional translator to do if an interesting project is on offer but that platform is unavoidable? Riccardo's tips on how to process the MXLIFF files from memSource in memoQ offer part of a possible good solution which would work almost equally well in most other leading tools as well these days. One additional bit is needed in the memoQ Regex Tagger filter to handle the other tag type (dual curly brackets) in memSource, but otherwise the advice given will allow safe translation of the memSource files in other environments. I can even change the segmentation in memoQ if, as usual, the project manager has failed to create appropriate segmentation rules in memSource to accountfor some of the odd stuff one often sees in legal or financial texts, and this does not damage or change the segmentation seen later when the working file is returned to memSource.

Even concerns about the "lack" of access to shared online resources in memSource if an MXLIFF is translated elsewhere are easily addressed. A few useful things for this include:

  • pretranslation of the memSource files to put matches into the target before transferring to other environments,
  • leaving the browser-based or desktop editor for memSource open in the background for online term base or TM look-ups, and
  • occasionally exporting and synchronizing the MXLIFF in memSource to make the data available to team members working in parallel on a large project - this takes just a minute or two and allows one as much time as needed for polishing text in the other environment.

The last tip is particularly helpful to calm the nerves of project managers who are like mother hens on a nest of eggs which they fear might in fact be hand grenades and who panic if they don't see "progress" on their project servers days before anything is due. One can show them "progress" every twenty minutes or so without much ado if so inclined.

I am past the point where I recommend any translation memory management server in particular for agency and corporate processes. There are advantages to each (except Across, where these are actually hallucinations) and disadvantages, and where I see real problems, it is seldom due to the choice of platform but rather the lack of training and process knowledge by those responsible for the processes. The bright and shining prospects of a translation server are easily sold with a slick tongue, but without an honest analysis and recommendation of needs for initial and ongoing staff training these too often end up being bright and shining lies. I think very often of a favorite German customer who invested heavily in such a system four or five year ago and has not managed one single successful project with the system in all that time. This makes me sick to think of the waste of resources and possibilities.

So on the project management and process ownership side, memSource may be a great choice. Certainly some of my clients think so, and the improvements in their business often back this belief up. And for those who work with gangs of indigent, migrant or sharecropping translators whose marginal existences make the investment in professional resources like SDL Trados Studio or memoQ seem difficult or undesirable, it may be all that is needed by anyone.

The good news for those who depend on the efficiency of a favored tool, however, is that with a few simple steps, we need not compromise and can get full value from our better desktop tools while supporting interesting projects based in memSource. So each side of the translation project can work with what works best for them, without loss, compromise, risk or recriminations.

And the translating quail who start out in a dark box with a stunting lack of possibilities can look forward to the real possibilities of work liberation in a larger environment richer in healthy possibilities and rewards.

Aug 1, 2016

Corpus Linguistics and AntConc in the 2016 US Presidential Contest

Professor Laurence Anthony's AntConc concordancing software remains my favorite tool for analyzing the word content of text collections for my professional translation purposes. Although a free tool, it offers some important functionality beyond what I can get from the integrated term extraction and concordancing means in my translation environment tools, particularly SDLMultiTerm Extract and memoQ. AntConc is my first recommendation to my friends who teach at university and want to introduce their students to practical corpus linguistics and to my clients in industry who need to produce useful glossaries which cover the most frequently discussed things in their range of products and services.

That is not to say that its features are the most wide-ranging, but in addition to dead-simple incorporation of stopword lists (still a problem for most memoQ users), AntConc (like many other academic concordancers) offers excellent facilities for studying collocations, those words which occur together in important contexts. For years I have begged that this useful feature be added to the tools for professional translators, because it is a great aid in studying the proper language of a particular field or subject matter, and although the memoQ concordance can in fact search for multiple terms at once so that one forms a visual impression of their co-occurrence in text, it lacks the simple precision of AntConc for specifying the proximity range of the words found together in a sentence.

In one form or another, tools for analyzing the frequency of words and the contexts in which they occur have been a part of my life for a very long time. And yet it did not occur to me to use them as a means of studying the many words that are part of the many political and social debates taking place in the countries that concern me. One can get a quick impression with fun word cloud pictures (such as those in this post, created from the convention speeches of The Orange One and The Infamous HRC using a free online tool). But AntConc lets you go deeper and achieve a greater understanding of how language is used to influence our thoughts and discussions.

Katelyn Guichelaar and Kristin Du Mez have done just that in an interesting article title, "Donald Trump and Hillary Clinton, By Their Words", which offers some interesting insights into the psychology and public postures of the two candidates. No spoilers here – go read the article and enjoy. Then think about the professionally and personally relevant ways in which you might use the practical tools of corpus linguistics.