Aug 27, 2011

Data exchange issues for translators & TAUS

Jost Zetzsche of the International Writer's Group has written an interesting short report on data exchange standards for translation content, how they affect individual translators, why we should care and why freelancers should be involved in the development of these standards. The report can be downloaded from the link here or from the TAUS website where you'll be asked to give contact information that you may or may not want to provide for a mailing list.

Jost's paper is a good overview which explains its points clearly for nontechnical audiences as well. He has a gift for that, which is why his Toolkit newsletter and Toolbox primer for translators are so enormously popular. In some way I find it a shame that this latest information is distributed via TAUS, and organization regarding which I hold no little suspicion, but it is important that we keep an eye on those who pursue their agenda, and some limited collaboration can in fact be a good way to do this.

Why am I suspicious of TAUS? Two things.

First, I find the organization's case for sharing all TMs rather weak. In certain areas like standard error messages for software, I can see the point. Without even considering copyright and privacy issues, I simply find the idea foolish for much of the work I do. I don't care to see most of the trash TMs that LSPs accumulate, fail to maintain properly and try to "leverage" on the backs of their translating teams. Why would I be interested in greater quantities of linguistic sewage? Even my own TM content gets stale after a while and needs an overhaul as language and usage evolve. Those who plan to leverage their 100% matches for the next decade or two should hope that their customers and users like the taste of cardboard and old shoes or they may find themselves with communication and image problems at some point.

The second problem I have with TAUS is the organization's silly, shameless shilling for machine translation. Horsefeather-stuffed essays like "Want to ride the machine translation tidal wave?" or the intimidation set piece "What options do translators really have?" with its Darwinistic principles for how translating monkeys must evolve should give TAUS very little credibility with those in the industry who do not exhibit disturbing characteristics presumed to be part of a lemming's DNA.

This is not to say that I do not support TAUS. I think all my customers' competitors should hang on their every word and pursue the full program of data leveraging and process automation, eliminating as much of the human element from translation as possible. In Germany, I fully support initiatives by the Arbeitsamt, which previously retrained displaced coal miners as occupational therapists, to offer career path alternatives to call centers by certifying the long-term unemployed as MT post-editors (part of the "evolution" touted by TAUS), perhaps even providing their services to industry in the form of the beloved One Euro Jobs in a tradition of slave labor anchored in the middle of the last century.

I am perfectly content to be sidelined by history here, to stand lonely on a high hill, a translation creationist foolishly resistant to the industry's evolution, and miss the thrill of the MT tidal wave as it washes away common sense and quality and leaves consumers and business people picking through the strewn detritus of meaning.

Where will you be? The surf is up!


  1. I have consistently found TAUS' argumentations to be fallacious at best and worrying at worst.

    Fallacious, because even sixth graders can see the points where logic has to succumb for the sake of ideology without efforts.

    Worrying, because even high-faluting MBA's managing big corporations fail to see what is obvious to sixth graders.

  2. Thanks for the post, Kevin. I share some of your hesitation about TAUS, or more specifically, the TDA, but I also see some real benefit for translators.

    The terminology search feature is extremely helpful, especially (and only) when you work in fields that are covered by the material of the companies that have supplied their material.

    So far virtually none of the material that is available on the site comes from LSPs, it mostly comes from translation buyers and mostly those who have done a good job maintaining their TMs, so the quality is very high. But, yes, I do share your concern on the age of the data.


  3. Laurent, I agree that some of the non sequitur conclusions that TAUS attempts to sell are rather distressing in their sheer intellectual sloppiness. But unfortunately if someone really, really wants to believe some idiotic thing, he almost inevitably will, at least for a while. Cults and confidence men depend on this and an MBA or Ph.D. is not an inoculation against such forms of stupidity.

    Jost, to the extent that HP, Oracle, Apple, Boeing or others of that class want to share well-maintained resources for common content, I do support initiatives such as those from TAUS. But unfortunately as you noted, the goal appear to go far beyond this into territory of dubious legality and even more dubious utility.

    I had an interesting conversation earlier today with a very frustrated user of translation tools who noted how inadequate most of the resources in major CAT tools are for doing effective content maintenance. I agree with this very strongly and I suspect that this is also a contributing factor to the lack of maintenance in LSP and individual translator TMs. I once spent two whole days cleaning up a TM for a chemical industry customer of a favored LSP, weeding out nearly 50% redundancy in the content while doing so. A year later I noted sadly that the TM is trashed worse than ever. Before TAUS or anyone else talks about the whole world sharing TMs, let's talk about ways to maintain them in a state that might perhaps be worth sharing.

  4. He Kevin, I am a little surprised and the aggressive tone of your post about Taus and in particular TDA. Since I respect your comments I would like a discussion to better understand your viewpoint

    Willem Stoeller

  5. Willem, I simply consider TDA unproductive and misguided if extended beyond very carefully maintained datasets that are purged and updated on an ongoing basis. The longer I use TM technology, the more I am confronted by the "shelf life" of content. Consider this, and you'll realize what nonsense statements like the SDL claim that "you'll never have to translate the same sentence again" really are. Show me an IT manual I translated in 2001 and I'll bet that I would translate it quite differently today because the language itself has evolved noticeably.

    And really, what would be the usefulness of "sharing" a typical LSP TM with a historical collection of trash from a large number of translators? The frequency of errors is often enormous, and after seeing errors propagated time and again in "untouchable" 100% matches, I want no sip from that poisoned chalice.

    But TDA isn't nearly as bad as the dishonesty and intimidation practices that I see TAUS engaged in with respect to machine translation. Read the articles I referenced above and tell me to what extent your personal experience - and I do mean experience, not hopes and dreams - supports the claims. The only tidal wave I have seen with regard to MT in the past few years is a consistent campaign to convince skilled linguists that their future lies in cleaning linguistic latrines. And I am inclined to agree with Miguel Llorens that the much-hyped "content tsunami" is a self-serving myth.

    I am a great believer in sensible process automation, and I think there is a lot that can still be done to improve the efficiency and quality of translation processes in this way. Better review workflows, tools for terminology mining and management and QA routines to identify some classes of errors are examples of this.

    However, MT and its potential are greatly oversold by those who stand to profit from the selling at the expense of the gullible. You can tell me all about the wonders of controlled language, but I have seen some of this in action for years with a major German provider of technical documentation services, and I don't see anything close to the control and consistency that MT advocates call for. Another sad reality is that all the talk of the wonders of "statistical" approaches gives many people the impression that a much broader range of content is a reasonable starting point for MT. And this leads to trouble time and again.

    The effort and expense being sunk into MT today is simply better invested elsewhere in most cases. I am perfectly willing to be wrong on this or any other point, but the evidence I see now will not lead me to be a cheerleader for the illusive future the MT carnival barkers speak of.

  6. Hi, Kevin,
    I absolutely agree with every word you say! I'm a freelance translator (Germany, EN-DE) for more than 20 years now, mainly dealing with the "Big 5" in the IT business.
    Not only that our word rates are on the dive since 1992 with Trados averaging, reluctance to pay for 100% matches that have to been read anyway, hourly rates that are half of that my car mechanic charges, etc., but now also MT rises it's ugly head again, and translators are requested to waive on 1/3 of their "standard" word rate, although the MT quality is so poor, that at least 75% (mostly more than 90%) of the content has to be translated from scratch again.
    In order to create more awareness for this topic I will hold a speech at the TEKOM annual conference in Wiesbaden (Germany) in October this year and would like to publish two surveys, one for translators and one for LSPs and translation customers on LinkedIn and other translation-related networks. I would be please if you are willing to support me, look out for this surveys and answer my questions. Maybe you are also in a position to forward them to other colleagues.
    Thanks a lot in advance and best regards

  7. I think this old quote describes MT limitations well:
    "Be very careful when reading health books. You might die of mistype" (c) Mark Twain


  8. There's a very funny thread on LinkedIn discussing this post. Being one of those antediluvian, progress-hating translators whose mind is too dulled by what Jaap described at the ATA conference in Ede as boring, repetitive tasks that are the lot of workers like us, I can only offer this link rather far down the comment chain. Y'all will have to scroll up, because I'm too mystified to figure out the URL for the top of the thread.

    All this is further evidence that I am incapable of evolving and will be washed off the beaches of prosperity by the coming MT tidal wave :-)

    I was quite entertained to hear that the new line from TAUS is that TM is dead and MT is the future. My foggy memory tells me that Jaap stole that line from Renato at some point. Unfortunately, due to a dog care crisis, I scuttled my plans to attend the conference in Ede, but I understand that my sources who attended will be submitting their detailed analyses and impressions. One of them expressed her great relief after hearing first hand what lead surfers of the tidal wave had to say. "Utter bosh!" was the summary more or less. She was quite surprised that no one was able to show any specific samples of the MT miracle, even in a workshop on post-editing. I wasn't.

  9. Kevin,

    First of all thanks for taking the time to answer me. The usability of the TDA data depends on the willingness of its participants to deliver clean data. The main contributors are large high tech companies such as Microsoft, Cisco, Adobe, Intel and large public organizations such as the EC. These are TMs related to released product/content so confidentiality is not an issue. The issue of copyright has also been debated in much detail and the current TDA members are comfortable with the TDA regulations on that point. As you know most translators and LSPs do not actually own TMs (work for hire), at least when working for US companies (I understand that there are exceptions).
    I fully agree with the issue of half life of the data and again here the responsibility is with the members to refresh data (part of their agreement with TAUS). Far from an ideal solution.
    Now in regards to TAUS claims: sure its marketing:) But the reality is that, at least in the high tech world, raw MT output is heavily used for support knowledge data bases with very good results. Additionally MT is used by a growing number of companies to translate product documentation, UI strings and other technical publications. This is typically a three step process: leverage from TMs, pre-translation using MT (preferably a customized system) followed by post-editing and review steps. This works because the enduser is interested mainly in correct meaning, completeness and consistency, not that much in style.
    For me MT is still completely falling down when dealing with more conceptual content (sales, marketing, literature, etc.) I personally have done a number of technical translation projects using the approach outlined above, where the quality of the target content met customer expectations.
    I see MT becoming more and more a useful tool for translators, but its a tool just as TMs are a tool.

    I would love to hear your opinion about community translation.



