Showing posts with label word count. Show all posts
Showing posts with label word count. Show all posts

Oct 13, 2013

Games agencies play, part 2: "word counts"

Last week a colleague called me up, very worried because her count of a rather tricky and somewhat long chemical text differed from the translation agency's count by more than 10%. I had only recently introduced her to that company and was vastly relieved to have competent backup for a recent flood of chemical manufacturing procedures, so the thought this might escalate into a serious misunderstanding put a sick feeling in my gut.

Fortunately, I had noticed some similar issues recently and had a conversation with the project manager involved about the unusual issues for her customer's texts and some of the technical challenges we face in overcoming legacy trash (format trash in this case, not content thank God) and making fair and accurate estimates of the work involved not only for fair compensation but also to plan some increasingly stressful schedules.

I discovered in the chat that the PMs at the agency were "in transition" with their working tools, and although they had SDL Trados Studio around for years, it was only being used about half the time for analysis and costing; the other half of the time the now-discontinued SDL Trados 2007 was used.

I spit my coffee in surprise. Well, I shouldn't have been surprised.

The old Trados tool generally gives much lower word counts, especially for the kinds of scientific texts I often do, with a good portion of dates and numbers to be dealt with. In addition to that, there are considerable differences in "leverage" (presumed matches from a translation memory, which is the case of the customer mentioned above are often useless and incorrect because of bad segmentation issues and massive crap in the TM from 10 years of failure to define appropriate segmentation exceptions). And then there are the tags, which are another matter as well: I love three or four words embedded in 20 or so tags in a segment. Whoever thinks something like that should be charged at a word rate with a count of 3 or 4 is a fool or a fiend or both.

But mostly these are just matters of ignorance and/or reluctance to understand the problem and consider it in costing and compensation.

Paul Filkin of SDL has an excellent presentation which I saw last year at TM Europe in Warsaw in which he showed systematic differences in text counts between tools. I suspect that information is available in some form somewhere, because it's also important for individuals and companies using Trados or other tools to understand just how pointless and arbitrary this focus on word counts actually is. (So far I've avoided bringing up the problem of graphics and embedded objects so frequently found in certain document types and how few of the tools in common use are able to count the text in these, much less the effort to access, translate and re-integrate that text. I've talked about that enough on other occasions, so not now.)

So what's the agency game here? Well, in the case of my friend's concern, no more than an unconsidered resort to the wrong tool by a project manager under pressure and in a hurry, and once they talked about it, it became clear that matters would get sorted out to nobody's disadvantage most likely. Word counts, and the tools chosen to make those counts, can have a huge impact on translator compensation. This can be exploited systematically by unscrupulous agencies to screw their service providers thoroughly, and I suppose there are a few out there beside Pam the Evil PM in the Mox comics who plot such moves carefully.

However, I think it's usually a matter of ignorance, where a bit of education is all that is needed. Sometimes it's fear: I have heard some silly skirts tell me that they are aware of the problem but that quoting with accurate methods would inflate job costs to a level their price-sensitive customer cannot accept. Usually, though, this means that this person or those in her organization responsible for sales lack the communication skills to deal maturely with clients and help them understand what is reasonable and sustainable for a good business relationship. I seldom argue with such people. I note them on the list of Linguistic Sausage Producers and cross them off the list of viable partners for work, and when I hear later how they are circling ever nearer to that drain to the sewers I might offer a sad smile of understanding, but I have nothing more to give.

An agency that offers piece-rate quotation but does not even try to estimate the "pieces" and their relationship to time required very likely does not have a sustainable business model. But that is probably no more unsustainable than all the panting bilge one sees from all those acolytes in the MT temple who don't realize they are brought into the rituals to be relieved of their cash and goods by a greedy IT priesthood eager for another great scam to live off like the old Y2K scare.

What do word counts matter when words will be free or nearly so, given to us by Machines of Ever Loving Grace in LQA-blessed near-perfection, requiring just a bit of post-editing time to be fit for purpose?

Ah, time. That's really the crux of the problem, isn't it? How much time will something take? Proper project management in which the inputs are measured and assessed correctly is critical to understand this regardless of whatever piece rates may or may not be applied. An agency owner recently mentioned a job he had to "translate" date formats into something like 14 different local flavors. He pointed out, quite correctly, that any word count, even an accurate one, was meaningless there. (And he revealed himself as a user of the old Trados by saying that the word count was "zero" anyway, which brings us back to the stupid logic of SDL Trados which began this discourse.)

I'm not an advocate of billing strictly by time. Yes, attorneys do that, but it's not really a viable model all the time anyway for all services, and one could a library with volumes of true tales on the abuse of the billable hour by law firms. Sometimes hourly rates make sense, sometimes the value, an intangible requiring some judgment and risk to estimate, matters more.

Time, value or meaningless commodity units (word, lines, pages or pounds of sausage): these will surely still be sources of consideration and dispute in the translation profession long after we are all dead. Until then, it really does pay to become more aware of current practice and its implication and remain alert so that it does not work to your disadvantage, even if the other parties are not deliberately playing a game.

Jul 17, 2013

How would you translate the chart in this DOCX file?

Can anyone tell me quickly the best way to translate the chart in this DOCX file? Or how to get an accurate word count of the words to be translated in the file?

*****

I love to see the different approaches people take to this problem. It's one which I think is encountered with some frequency by translators, and in the past I too many different approaches to it - long ago I usually did something involving PDF conversion, editing of the PDF and making a screenshot. But that is inefficient and doesn't allow the use of CAT tools.

Yesterday I picked up a project with 18 of those silly charts embedded in it. A real nuisance. Here's what happens if you try to edit one of those charts in situ:


Hopeless, right? A lot of very authoritative web pages make it clear that without having the linked Excel files, you cannot modify the text. Not true, actually. With or without hints, a number of technically versatile colleagues found ways to solve he problem or at least made close guesses. Some of these are here in the comments. One very interesting exchange on Twitter showed than somehow the settings of the OmegaT import filters can be tweaked to solve this:




The thing about OmegaT is that it's sort of geeky - the solution looks pretty good here, but I can't actually make it work myself.

The solution I worked out last night is very similar to the one described by Stanislas in the comments.
  1. Change the file extension to ZIP
  2. Look inside the ZIP file with Windows Explorer or another suitable tool as described in other blog posts.
  3. Inside the "word" subfolder there is a folder named "charts". It contains XML data with all the chart headings, numbers and labels. Copy it.
  4. Paste a copy of the folder where you want your source files. Import the chart XML files into any CAT tool or XML editor. It's a good idea to configure a filter to exclude and protect the references to the original Excel files with the data. (Though I am curious whether deliberately spoiling these data can protect against the unwanted update that one person worried about in the comments. I'll have to try that.)
  5. When the translations are completed, paste the XML files back inside the charts folder in the file structure.
  6. Rename the extension back to what it was at the start (DOCX in this case). You're done. No refresh necessary (unlike with embedded Excel or PowerPoint objects).



A memoQ filter configuration for these XML files can now be found on Kilgray's Language Terminal.