Oct 13, 2013

Games agencies play, part 2: "word counts"

Last week a colleague called me up, very worried because her count of a rather tricky and somewhat long chemical text differed from the translation agency's count by more than 10%. I had only recently introduced her to that company and was vastly relieved to have competent backup for a recent flood of chemical manufacturing procedures, so the thought this might escalate into a serious misunderstanding put a sick feeling in my gut.

Fortunately, I had noticed some similar issues recently and had a conversation with the project manager involved about the unusual issues for her customer's texts and some of the technical challenges we face in overcoming legacy trash (format trash in this case, not content thank God) and making fair and accurate estimates of the work involved not only for fair compensation but also to plan some increasingly stressful schedules.

I discovered in the chat that the PMs at the agency were "in transition" with their working tools, and although they had SDL Trados Studio around for years, it was only being used about half the time for analysis and costing; the other half of the time the now-discontinued SDL Trados 2007 was used.

I spit my coffee in surprise. Well, I shouldn't have been surprised.

The old Trados tool generally gives much lower word counts, especially for the kinds of scientific texts I often do, with a good portion of dates and numbers to be dealt with. In addition to that, there are considerable differences in "leverage" (presumed matches from a translation memory, which is the case of the customer mentioned above are often useless and incorrect because of bad segmentation issues and massive crap in the TM from 10 years of failure to define appropriate segmentation exceptions). And then there are the tags, which are another matter as well: I love three or four words embedded in 20 or so tags in a segment. Whoever thinks something like that should be charged at a word rate with a count of 3 or 4 is a fool or a fiend or both.

But mostly these are just matters of ignorance and/or reluctance to understand the problem and consider it in costing and compensation.

Paul Filkin of SDL has an excellent presentation which I saw last year at TM Europe in Warsaw in which he showed systematic differences in text counts between tools. I suspect that information is available in some form somewhere, because it's also important for individuals and companies using Trados or other tools to understand just how pointless and arbitrary this focus on word counts actually is. (So far I've avoided bringing up the problem of graphics and embedded objects so frequently found in certain document types and how few of the tools in common use are able to count the text in these, much less the effort to access, translate and re-integrate that text. I've talked about that enough on other occasions, so not now.)

So what's the agency game here? Well, in the case of my friend's concern, no more than an unconsidered resort to the wrong tool by a project manager under pressure and in a hurry, and once they talked about it, it became clear that matters would get sorted out to nobody's disadvantage most likely. Word counts, and the tools chosen to make those counts, can have a huge impact on translator compensation. This can be exploited systematically by unscrupulous agencies to screw their service providers thoroughly, and I suppose there are a few out there beside Pam the Evil PM in the Mox comics who plot such moves carefully.

However, I think it's usually a matter of ignorance, where a bit of education is all that is needed. Sometimes it's fear: I have heard some silly skirts tell me that they are aware of the problem but that quoting with accurate methods would inflate job costs to a level their price-sensitive customer cannot accept. Usually, though, this means that this person or those in her organization responsible for sales lack the communication skills to deal maturely with clients and help them understand what is reasonable and sustainable for a good business relationship. I seldom argue with such people. I note them on the list of Linguistic Sausage Producers and cross them off the list of viable partners for work, and when I hear later how they are circling ever nearer to that drain to the sewers I might offer a sad smile of understanding, but I have nothing more to give.

An agency that offers piece-rate quotation but does not even try to estimate the "pieces" and their relationship to time required very likely does not have a sustainable business model. But that is probably no more unsustainable than all the panting bilge one sees from all those acolytes in the MT temple who don't realize they are brought into the rituals to be relieved of their cash and goods by a greedy IT priesthood eager for another great scam to live off like the old Y2K scare.

What do word counts matter when words will be free or nearly so, given to us by Machines of Ever Loving Grace in LQA-blessed near-perfection, requiring just a bit of post-editing time to be fit for purpose?

Ah, time. That's really the crux of the problem, isn't it? How much time will something take? Proper project management in which the inputs are measured and assessed correctly is critical to understand this regardless of whatever piece rates may or may not be applied. An agency owner recently mentioned a job he had to "translate" date formats into something like 14 different local flavors. He pointed out, quite correctly, that any word count, even an accurate one, was meaningless there. (And he revealed himself as a user of the old Trados by saying that the word count was "zero" anyway, which brings us back to the stupid logic of SDL Trados which began this discourse.)

I'm not an advocate of billing strictly by time. Yes, attorneys do that, but it's not really a viable model all the time anyway for all services, and one could a library with volumes of true tales on the abuse of the billable hour by law firms. Sometimes hourly rates make sense, sometimes the value, an intangible requiring some judgment and risk to estimate, matters more.

Time, value or meaningless commodity units (word, lines, pages or pounds of sausage): these will surely still be sources of consideration and dispute in the translation profession long after we are all dead. Until then, it really does pay to become more aware of current practice and its implication and remain alert so that it does not work to your disadvantage, even if the other parties are not deliberately playing a game.


  1. Well, you should see the job I did some time ago. A chemical patent, full of names like "4-(2-([1,2,4]triazolo[4,3-a]pyridin-3-yl)quinolin-8-yloxy)pyrimidin-2-amine". I've got the job as bunch of TTX files, with initial wordcount of about 50k words. Then, when I already had about 30% translated, the LSP contacted me with a problem - client says, that according to Word, there's only about 37k words, and they won't pay for more.
    I wonder how many of you know where the difference comes from - I knew and I was able to explain to the agency owner right away. Unfortunately, the end client was a prick and threaten to withdraw a job, if the LSP won't accept the changed wordcount. Fortunately, they were fair - we used the new wordcount as a basis, but the rate was rised, so I've got only about 10% less money than the initial estimate - of course I could refuse and get the money for what I already did from LSP pocket, but it's a good client, so it wasn't worth it.

    Now, about the difference - the name I quoted above is a single word for Word and almost any CAT tool on the market. Except for Deja Vu and Trados 2007 - it's 7 words for them, if I count correctly. These are still the best tools for chemistry work... unless you know a trick, which I'm going to present at Triconf next week: then you can use any modern tool to effectively translate long chemical names.

    1. István Lengyel wrote quite a nice explanation of the different word count behaviors in MS Word, memoQ and some other tools, talking about what constitutes a "word", something that is oddly much in dispute. I wanted to reference it in this post, but although I've used the link several times before, I just couldn't dig it up on the fly. But the problems you and I face with long chemical names are reflected the same way in all the hyphenated expressions one sees in English and German as well. And then of course there are the problems of compound words, where a lot of abuse occurs; that's one reason why I tend to use character counts or just make a reasonable overall estimate of time and effort in a fixed job price and tell a sausage-maker who wants that broken down into words to do the division and expect that number to change from job to job.

      I'll be curious to hear about your approach after the conference next week. Other than searching and replacing hyphens and parentheses or brackets with spaces for counting purposes, I can't imagine what it might be. It would be nice to see some attention paid to optimizing our working tools, but I waited long enough to be able to deal with subscripted variables and empirical formulae without a mess of tags, so I won't hold my breath.

  2. Very important article, Kevin.
    You touched upon a very important subject and described the complexity of it very nicely, as well as presented yet another mechanism for identifying Language Sausage Providers,

    Paul FIikin's explanation can be found here http://multifarious.filkin.com/2012/11/13/wordcount/

    1. Thank you, Shai! I didn't realize that Paul had also made a blog post with that information; I remember his talk the month before at TM Europe where he presented similar comparisons. Someone else was also kind enough to send me similar data after this post went up, and I'll be sharing it as well after I've had a proper look.

  3. I've noticed that memoQ seems to differ much from Trados in its analyses, even up to 15%.

  4. Well, a recent client did the wordcount in Wordfast (he did not tell me that) and I did the job in Trados (he told me to use any CAT tool). The difference was 1100 words (22+%).
    However, I made a Wordfast analysis myself (with Wordfast online) and - surprise! - the difference is round 1000 words.
    Is this a con artist?

    1. I doubt there is any attempt to deceive there. This is the problem we all face: there is no real standard for arriving at these counts, yet some hair-splitters will argue for an hour to save 3 euros due to tool count differences rather than deal with the business transaction at a higher, more sensible level. People who know better still get caught in the stupid trap of thinking about these services as commodities like palettes full of toilet paper or pork bellies sold on the Chicago futures markets. This is why it's often better to deal with flat rate quotations after assessing all relevant parameters. Everyone needs to get beyond the sausage shop mindset of the festering bulk market bog and behave like people with services of value to offer who focus on the service and not what shows up after the decimal separator on the quotation or invoice.

  5. "Silly skirt" is a phrase I haven't seen before. It appears to be directly putting down female employees, doubting their intelligence and referencing them as objects. It seems quite rude, which is too bad because it turns people off from reading your writing in the future.

    1. The prejudice you exhibit here is appalling! The assumption that skirts are associated only with female employees is completely out of touch with modern, gender-fluid thinking.
      Although my blog is noted as a bastion of misogyny, I do try to include enough other causes for offense in my writing that all should have equal reason not to read it. This sort of sociolinguistic booby-trapping ensures that only the most intrepid adventurers reach the hidden treasure ;-)

  6. All joking aside, what did you mean when you wrote "I have heard some silly skirts tell me that they are aware of the problem ..."? Just noticed it after reading the anonymous post. I'm not usually very into all that PC stuff myself, but was a little puzzled by it.


Notice to spammers: your locations are being traced and fed to the recreational target list for my new line of chemical weapon drones :-)