Sep 25, 2011

Fixing a screwed-up OCR job

I was having a pleasant day doing remote tutoring of a friend who is a new memoQ user and talking about various file type issues and approaches to quotation. She mentioned a mutual acquaintance, Helen, who became so disgusted with the incompetent attempts of agencies and some direct clients to convert PDF documents and foist them off on her as "Word documents ready to translate", which the results are, as anyone familiar with such things understands, generally not. Helen has written into her general terms and conditions of business a special clause stipulating a surcharge for both PDFs and any OCR documents. In my opinion, the surcharge for bad OCR should probably be double that for PDF, which I could use as the basis to do a proper OCR conversion.

Experienced translators are well aware of the horrors of bad OCR, including documents that look like the original but which undergo disconcerting font changes with the use of the classic Trados macros to translate, or where text blocks disappear when embedded in wrong-sized boxes, section breaks disrupt text, words display in CAT tools with tags embedded in the middle of words, thus screwing up terminology lookups and TM matches and more. I hope there is a special place in Hell for those who think that usual rates should apply to such time-wasting messes.

There are many remedies for these problems, as solutions such as Dave Turner's CodeZapper macros or the memoQ import option to ignore irrelevant tags for Word documents, but there really is no good substitute for doing the conversion right the first time. This is a skill I have taught to colleagues and clients on a number of occasions, because it saves everyone time and money.

After I ended my chat and tutorial with C, I went to work on a new project due tomorrow. I had been putting it off while working on some tutorials for next week, but I still had time for dealing with it at a relaxed pace. Then I opened the document and realized that while I was distracted earlier this week, the project manager had sent me the horror of all horrible OCR jobs, an automatic conversion that violates every principle of good OCR practice. And it's Sunday. I'm screwed. No PDF to re-do the OCR my way.

Then I realized there are two possible solutions to address this problem which do not involve medium-range missiles.

I could, if the document had a lot of text boxes in bad sequence, print the file to PDF and do another OCR of that. But the text flow in this case is mostly in one block, so that's really not an issue. I followed another procedure which gave me a raw file much like I create when doing a complex OCR, which I then quickly reformatted to give me a usable source file that would not be polluted with tag trash and inconvenient text breaks. The steps were as follows:
  1. Save the bad OCR DOC file as plain text with the desired encoding.
  2. Open the plain text document in Microsoft Word or another full-featured text editor.
  3. Check the sequence of the text flow to be sure that it is correct and complete.
  4. Correct any "broken" sentences caused by line breaks in the wrong place in the OCR document. A bit of clever search and replace can usually be used to protect desired paragraph breaks before converting the unwanted ones into spaces to restore the messed-up sentences.
  5. Do any other formatting you want for page numbering, bold text, subtitle styles or whatever. 
The resulting file can be saved as RTF, DOC, DOCX or whatever you need and then be used without trouble in your translation environment tool of choice. This will save a lot of time compared to what you will waste dealing with the poorly converted OCR document. But, as part of the customer service philosophy that encourages partners to work together in the most efficient way possible and respect the time and efforts of the other party, appropriate surcharges for the time do apply.

Cleaning up my 18 pages of garbage from the PM took a bit over half an hour altogether, including some corrections of OCR errors, and if I want to get a clean source document after my translation, I simply correct any source errors I find in memoQ as I work and export a fixed source document later.

Sep 21, 2011

The future is here (targeting, profiling and clustering)

For personal reasons, I was unable to attend the recent conference on machine translation held in Ede in the Netherlands as I had planned. A colleague, Diane McCartney, was there and has gracefully consented to share her impressions and extensive notes from the day, which will appear here in several blog posts, of which this is the third part. It was a busy day.


Renato Beninatto’s presentation on sales strategies was well worth attending. If Renato is good at anything, it's sales, and I was particularly keen to get even the slightest whiff of how he does it. I was also curious to find out if he could be anything other than provocative.

In terms of targeting, Renato explained that the market matters less than how the client buys: one buyer, several buyers, procurement department, or tender procedure. I agree that it is easier for bigger outfits to get their foot in the door of large companies, but one never knows, and participating in a tender could be a good experience – just make sure you have a good plan in place should you submit the winning bid!

Everyone goes after Fortune 500 companies, so you should target the ones that are not on the list, because there will be less competition. Makes total sense, and because these companies usually have procurement departments or work with tenders they’re pretty much out of reach for smaller outfits anyway. Fast growers, Renato tells us, are the companies to focus on, because they buy in an unstructured way. I’ve worked with many fast growers, and if “unstructured" means that you have several contacts in one company, I agree. If one department likes you, you are quickly referred to other departments. The nice thing about this is that you have variety in your work and get to know your customer’s business really well. (I wonder if this falls under the stupid and repetitive tasks that Jaap referred to so often.) It actually enables you to provide advice where necessary too. (It’ll be interesting to see how MT fills that gap.) However, I’ve also found that the relationship with such companies has a best-by date: they either get taken over by a company that has its own translation vendor or your contact leaves the company (in which case, you might get taken with to the new employer and actually win a new customer), and the new guy brings in his/her own translator. But definitely go for it. These companies are both interesting and challenging to work with. One way of finding them, Renato tells us, is by monitoring press releases. Here comes one of the statements I really liked: people hate being sold to but love to buy. Here, Renato asked the participants what they do when they walk into a store and the sales person virtually pounces on you and asks if you need help. In most cases you say “no”, browse for a while and leave. Try to get the customer to come to you by getting them to talk about their needs, and then tell them how you can help.

Renato wraps us this part of his workshop by telling us that we should not sell translation. Translation is based more on the relationship you have or establish with your customer or potential customer than on the direct need; people buy from people they remember and like. Get the customer to start talking about translation. And here comes another statement that struck a chord with me: try hard not to talk about yourself – talking about yourself is equivalent to voicing an opinion; decisions are based on facts.

Clustering is also something to pay attention to: companies in a specific industry fall in groups because the conditions are right for them in a specific area. Targeting the cluster means you can offer services to similar companies at the same time. Market to the cluster, sell to their needs, organize and participate in industry events.

Profiling is used to determine the characteristics of the buyers. People who buy translations in companies are the localization manager; documentation manger; training manger; the legal department; the marketing department and, of course, the procurement or purchasing department. Men in their 40s seem to be better buyers than women in their 30s simply because they are at a higher level in the hierarchy. Profiles of buyers could be: men/women; young/old; kids/no kids etc.

Create a list containing the size of the companies, their budget, etc. and go after companies that have the same or very similar characteristics.

If you do advertise, advertise in magazines people read. As an example, Renato told us about the magazines that can be found in his bathroom, which are mostly women's and fashion magazines. If I remember properly, Renato mentioned that he is married to a former buyer at HP …

Think up stories that people will remember you by – don’t necessarily publish an article about yourself on what you do. For example, Cosmopolitan ran an article about dads working at home that featured an interview with Renato. He was also featured in a full page advertorial on Buenos Aires in Fortune Magazine. The latter got him a lot of business. And, he reminds us, participate in, be active and be visible at industry events! Note that the term “industry events” does not necessarily refer to the industry you're in but to the industry your customer is in.

Define the type of sales/channel structure - "spray and pray" is not effective. Spray and pray is the mass e-mails you get offering you anything from Viagra to a web site. A web site and telesales don’t generate a lot of sales; in fact, telesales should be used to make appointments. There’s no such thing as a store that offers translations (although I kind of like the idea of being able to walk into a store and have my favorite poem translated). High revenues come from direct sales.

One thing I found really interesting is that Renato is not a particular fan of LSPs that offer all languages in all domains 24 hours a day.

I liked seeing the other side of Renato, and if he is giving this workshop at a conference near you, I recommend attending it.


Diane McCartney was born in California and raised in Germany where she attended a French-German school. She set up the translation department at ASK Computer Systems, where she used a UNIX program to prepare text for translation and review. Today she is based in the Netherlands and has been running her own company since 1997.

Sep 20, 2011

The future is here (Dr. Sharon O'Brien's presentation)

For personal reasons, I was unable to attend the recent conference on machine translation held in Ede in the Netherlands as I had planned. A colleague, Diane McCartney, was there and has gracefully consented to share her impressions and extensive notes from the day, which will appear here in several blog posts, of which this is the second part. It was a busy day.

In my opinion, Dr. O’Brien’s presentation was the highlight of the day. I still have a hard time believing that someone who doesn’t believe in MT as it is being shoved down our throats had been invited to the conference. But boy, I’m sure everyone in the room was glad they had chosen this particular workshop. It was so encouraging to see that research is being done by people who are interested in the research and not in selling vaporware. 

She started with a short introduction of MT and its history. MT, we learned, has only really taken off in the last ten years when rule-based systems and statistic-based systems were married to create a hybrid paradigm. Rule-based systems consist of coding dictionaries, creating rules and ensuring the rules do what they’re supposed to do. TMs, so-called data-driven corpora, are used to create a statistic-based data-driven engine. The quality of the MT’s “training,” which is done by editing translated segments, is crucial to the quality of the output.

Symantec, which uses Systran, funded the MT research at Dublin City University, but as Dr. O’Brien says herself, she was not there to blow the Symantec or Systran horn but to give us a picture of MT that is based on a real scenario in a real, live environment. Symantec uses Systran because it enables them to quickly translate virus alerts. An engineer in Latvia, for example, doesn’t need a highly polished translation but a set of understandable instructions he needs to carry out. Here, the accuracy of the translation outweighs its style. This is a perfect example of “fit for purpose,” which is taught in translation theory and implies that a translation has to be accurate rather than polished. Symantec uses MT successfully because they know what they want, have taken the pre-processing steps, have involved the engineers and translators in the process and have implemented guidelines for writing for machine translation.

Dr. O’Brien ran her post-editing test in French and Spanish and used the LISA QA metric to assess it. The test was run with a good terminology database and a good MT. The results for French and Spanish were very similar, but would have varied if other, not so well-prepared, MT engines had been used. She pointed out that quality may be subjective but that we would probably all agree that “good quality” generally means a translation that accurately reflects the meaning of the source text and that one could rely on if one’s life were in danger. She also pointed out that Asian languages will produce different errors than Western European languages because the markers are different.

Quality being the hot topic of the day, she overtly disagreed with Renato’s statement and explained that she would talk a lot about quality. According to the research, the highest quality is achieved when there is a fit between the source text and the contents of the MT. Domain-driven engines are more successful than engines based on generic data. The assumption used to be the more data, the higher the quality, but new research has shown that the quality rather than the quantity of the data is crucial and that pre-processing steps are essential! If she didn’t have our full attention, she sure had it now!

So what does the post-editing challenge consist of? It consists of, well, trained bilingual translators fixing errors in a combined MT environment. MT developers are talking about monolingual post-editing, but no one really thinks that is a good idea because there is no way of checking the accuracy of a translation if the person reviewing the text doesn’t speak the source language. Throughout her presentation, Dr. O’Brien points out time and time again that tight control is the key in every area that touches on MT and that quality issues can and should be tackled at the source.

We also learned that there are in fact several levels of post-editing: Fast post-editing, which is also referred to as gist post-editing, rapid post-editing and light-post editing, consists of essential corrections only and therefore has a quick turnaround time, and Conventional post-editing, which is also referred to as full post-editing, consists of making more corrections, which result in higher quality but a slower turnaround time.

These levels are problematic because there are no standard definitions for the terms and no agreement on what each level means, and this creates a mismatch of expectations. A good way of defining which level of post-editing a customer needs it to discuss:
Volume - How many words/pages?
Turnaround time - How much time has been planned for post-editing?
Quality - How polished does the translation have to be?
User requirements - Who are the readers and why will they be reading it?
Perishability - Time in the sense of when the translation is really needed
Text function - What is the purpose of the text?
The distinction between light and full post-editing is in fact useful. The key to determining the level of post-editing needed depends on the effort involved, meaning the quality of the initial MT and the level of output quality expected. However, the customer may not know what they want themselves and may therefore be disappointed by what they get. It should, however, be clear whether the customer wants “good enough” quality, or quality that is similar or equal to human translation.

The nature of the post-editing task will vary depending on whether the quality of the output is good. If the quality is good, post-editing will consist mainly of minor changes, such as capitalization, numbers, gender, style and maybe a few sentences that need retranslating. If the quality is bad, the situation is reversed and post-editing will consist mainly of major changes, meaning more sentences that need retranslating and a few minor changes such as capitalization, numbers, gender etc.

There are many ways of measuring the quality of MT, some of which are more useful for post-editing and localization processes than others. The quality metric example in Dr. O’Brien’s presentation is that used by Symantec. There are, however other metrics such as General Text Matcher (GTM) and Translation Edit Rate (TER). The post-task edit distance is measured by comparing raw MT output to the post-edited segment and gives a score based on the number of insertions, deletions, shifts, etc. Whichever metric is used, it is important to remember that quality issues can be tackled at the content creation and pre-processing stages.

In order to get around the cost and subjectivity of the evaluation of translation output, IBM developed Bleu scores. This metric consists of taking a raw MT sentence and comparing it to a human translation, which is the Gold metric. This metric, however, only determines the similarity between the two, not the quality. This score only works in conjunction with a reference translation. MT providers all have Bleu scores and compare them with each other, but they are only useful for system development and comparison – they are not meaningful for the post-editing effort.

An alternative to Bleu scores are confidence scores, which are generated by MT by using its own knowledge about its own probabilities and its confidence of producing a good quality translation.

It terms of productivity, research has shown that post-editing is faster than translating and that the throughput rates vary between 3,000 and 9,000 words a day. However, comparisons are often made on first-pass translation versus post-editing, i.e. there is no revision. There will always be individual variations in speed that will differ across systems and languages. Experiments of post-editing using keyboard logging software show that post-editing involves less typing than translation, which probably matters more in terms of RSI than speed because translators are generally fast typers.

The cognitive effort required by translation and editing is rarely considered in research. However, translators report being more tired after post-editing and find post-editing more tedious, probably because they have to correct something they wouldn’t have written in the first place.

Dr. O’Brien didn’t spend much time on pricing, but she did make it clear that a whole new pricing model will have to be developed for MT post-editing. In her opinion, structured feedback to the system owner should be paid for and translators should be involved in the development of the system, terminology management, dictionary coding etc.

New generations of translators will benefit the most from post-editing because they will have grown up with technology and social networks and will be more flexible in terms of quality. Research suggests that students can learn about translation through post-editing.


Diane McCartney was born in California and raised in Germany where she attended a French-German school. She set up the translation department at ASK Computer Systems, where she used a UNIX program to prepare text for translation and review. Today she is based in the Netherlands and has been running her own company since 1997.

Sep 19, 2011

The future is here (artful manipulation)

For personal reasons, I was unable to attend the recent conference on machine translation held in Ede in the Netherlands as I had planned. A colleague, Diane McCartney, was there and has gracefully consented to share her impressions and extensive notes from the day, which will appear here in several blog posts, of which this is the first. It was a busy day.


I was very skeptical about the conference that the ATA (Association of Translation Agencies in the Netherlands) held in Ede on machine translation. I usually avoid conferences like this because the high level of BS and marketing make me sick to my stomach. But I don’t believe in opinions based on gut feeling, outdated knowledge and obsolete technology, so off I went.

The day was not at all what I expected. Renato Beninatto’s keynote speech was not half as provocative as I thought it would be, but that may be because there was no one there to talk back. The room was filled with 135 translators and a deafening silence that was only interrupted by a gasp as Renato calmly stated that “quality doesn’t matter”. Talk about setting the tone: nobody talked about anything else for the rest of the day! And no one seems to have heard what he added to that statement almost immediately after he made it, namely “…until it is missing”.

Yes, the agent provocateur had struck again, although I found him to be much tamer than at last year’s Tradulínguas conference where I heard him provoke for the first time. Renato is oh so happy to plant controversial statements in the middle of the room and wait for his audience to react, but the sober and ever-so-practical Calvinist Dutch were unshaken. Maybe that’s because the audience, most of which was between 40 and 60 years old, had heard it all, seen it all, done it all before and before and before. Am I repeating myself? Ah, yes, well, MT we were informed at the end of the day by the Dutch visionary Jaap van der Meer was to do away not only with the repetitive tasks translators found themselves confronted with, but with the stupid tasks they were forced to do too. Human translation is stupid, machine translation is smart, long live machine translation. Yes, indeed, the future has been here for 50 years only we’re all too stupid to do something with it.

We learned that “projects” will disappear and “drops” will be replaced with “drips.” Fully automated, integrated project management and translation environments stored on the SaaS provider’s server will reduce project management to a monitoring activity enabling project managers to focus on exceptions. Yes, we have been working like this for many years – although I don’t see “projects” as such disappearing because one still needs a project number for all sorts of tracking and tracing purposes. SaaS, however, still has to learn how to walk before it can run – try telling that to a software developer – in order for anyone to claim that this part of the future is here.

In terms of translation resources, crowd sourcing is the answer to a quick turnaround time and the best translation of software because the translation is done by the users. I’m sure this works well in environments such as Facebook where adherence to terminology and style are of no import, but what do you do with IBM’s answer to a European request for proposal? Or a bank’s investment strategy? These very real concerns were dismissed with a wave of the hand and a “Sometimes it’s better to apologize than to ask for permission. If you're worried about confidentially, talk to your lawyer.” The future sure looks bright for Justice.

Translators should focus on services and revenue, not on price: We have to give customers what they want when they want it at the price they’re willing to pay. The recipe for success is differentiation: Give your customers more than they’re getting now and sell an unlimited number of languages. And remember: you do not define what you do; your customers do. Companies need to channel and understand what people of all nationalities and languages are saying about their products on the web. Opinions voiced on social networks are very important because no one reads a company’s marketing material. (So, companies could achieve huge savings by firing their marketing departments, stopping their market research activities and no longer translating the marketing material created to tap into foreign markets. All they need to do is put their name and address on Facebook, fire the marketing department, get a machine to trawl the web and gather users’ opinions, publish them on their Facebook page and let MT translate them into the language of the user looking for information about the company. Very efficient and cost effective.)

So we produce translations of a lesser quality. Who gives a toss if that’s what the customer wants? And what’s quality anyway? A subjective notion that depends on your customer’s definition of it. You may think a translation is too literal, but your customer may think you’re the best translator on the planet. You may love the wit in your translation only to find that the customer is stumped because they have no clue what you’re talking about. Focusing on revenue is not as simple because it means focusing on volume and volume is usually paired with quantity breaks (thank you Microsoft and IBM). But since we do have colleagues who are willing to work for a pittance, it looks like rates are not going to stop falling anytime soon. It will be interesting to see the new pricing models agencies impose on us because they, and not the end customer, will be driving the MT bandwagon until more large corporations start implementing MT themselves. Somehow I have the feeling that it's Trados with another name ….

We talk about the future, Renato explains, because the present is boring or we are totally lost. So in other words, we would be neither bored nor lost if we hadn’t listened to the BS that has been shoved down our throats for the last 20 years. So the same guys who helped us up that creek and made sure we lost the paddle are the same ones who are going to bail us out? Sounds like another rollercoaster ride to hell to me. And what did Renato mean when he said that we had to “make sure we didn’t make the same mistake with Trados by not stopping Jochen Hummel from selling it to companies?” It’s impossible to keep anything secret these days, especially from those driving the development of a new technology! Click on Members on the TAUS Web site and tell me again that we need to keep MT to ourselves.

In Jaap van de Meer’s Translation Business Innovation workshop, we in fact learn that TAUS was founded by companies experiencing with their own MTs. Understanding language is a matter of collecting loads of data and putting it in the cloud. Our culture is one of reciprocal collaboration: I win, you win. MT is a utility that is here to stay: It is a basic human right, a utility, like roads, utilities etc. I am totally lost for words ….

Language is a social experience in which we include and exclude people, invent words and change grammar. MT is therefore not a threat in this area. (Whatever that means, because as far as I can tell, this is the whole problem with MT: The flexibility of language and the flexibility with which people use language. But then I’m not a visionary!)

We are also told that MT has not gained the place it has today because computing power has become cheaper and content has been exploding, but because WE the users have changed: we no longer demand fully automatic high-quality translation, but fully automatic useful translation. We accept poor quality because we need real-time translation of even the most trivial piece of information. (Listen to the BBC World Service’s “World have your say” program and think again. And how will such information be translated into every listener’s language? Will we be wearing the special glasses that were developed so the hearing impaired can enjoy movies at the cinema without inconveniencing those who are not hearing impaired? Or will we be wearing special earplugs? Anything’s possible, I’m sure, but when the latest gadget finally hits the market, who will guarantee that the on-the-fly translation is accurate? Will we have to be afraid of new conflicts arising as the result of a machine mistranslation? Is this the future we’re so keen on reaching?)

Because machine translation is now based on hybrid systems, i.e. a mixture of rule-based and statistical systems, the targeted correction of a text, i.e. MT post-editing will no longer be necessary within the next 5 years. The post-editor will have put himself out of work because MT systems will have learned so much from the corrections. Translation engines are currently being produced in real time. For example, you can upload your documents to Systran to train the MT. But beware of the pirates: Google, IBM, and Microsoft are aligning everything they can find on the web to fill their databases. (I hate to say this, but the stuff that these companies are aligning off the web has in part been created by the same people who are filling the Systran database. How does that make Systran a better engine than Google, IBM and Microsoft? Or lesser pirates? Are Systran and TAUS actually paying you for the stuff you add to their databases?)

Companies need a language strategy. (Considering that all LSPs sell themselves as consultants “with the in-depth knowledge companies need to set up and/or review their translation strategies,” this statement rather surprised me. But then again, I have yet to meet anyone working at an LSP who can explain to me what a language strategy is.) The only thing companies have done so far is force LSPs to reduce their rates and squeeze more words through the funnel. This is all thanks to social media, which they use to do their marketing. They need to maintain multiple language spheres and we need quality definitions – several of them (which means, I guess a TAUS QA model as opposed to a LISA QA model, which, like the LISA model, will only ever apply to the translation of software). In five years time, translation will be really interesting because translators will have choices as opposed to earlier – they will be able to choose to move up the ladder and provide high-quality translations. (Think about this: the guy who wants you to upload your translations into his database is actually telling you that you are currently producing crap. And why should we need translators in five years time when MT has put editors out of business?)

Providers of international products want to differentiate themselves by providing very sphere-specific translations, so we need high-quality translations. This means specializing – transcreation, hyperlocalization. (Hang on a second – didn’t we just say that trawling the social networks for users’ opinions was the only thing that mattered because no one reads marketing material anyway? Didn’t someone also mention that MT will enable LSPs to offer more languages in more areas so we could service more customers? And didn’t Renato clearly state in his keynote speech that quality doesn’t matter, and was I really on a different planet when Jaap said that we the users are prepared to accept translations of lower quality? I don’t get it.)

Translation memories are a thing of the past, we are told, totally outdated technology in dire need of replacement. We are moving into the era of the semantic web in which translation memories will no longer be assets and therefore no longer need to be protected like crown jewels. Today’s MT produces much better results than the hopelessly outdated TM technology, which only leverages the segments you enter into it. New tools will soon be available that will have massive leveraging capabilities, and a positive by-product will be that we will be able to preserve endangered and less spoken languages, like Welsh. These languages will not disappear like others. This is all very exciting.

Don’t even bother holding that thought because the most interesting presentation of the day revealed quite the opposite, namely that translation memory maintenance will be more important than ever, because as most of us know, MT does not produce suitable results unless it is combined with a TM. The same goes for terminology management. Why? Because rule-based MT uses multi-lingual dictionaries to translate and you can’t expect a translation to have any level of accuracy if it cannot draw upon an accurate list of terms. Remember, MT consists of translating words in a dictionary based on rules, and NOT of translating concepts or meaning. Unlike humans, machines don’t know what meaning and context are. So MT may be able to create an accurate translation, but it is not able to create and idiomatic or meaningful one. This is why MT has sucked for the last 50 years and will continue to suck unless ...

We take on the pre- and post-editing challenge. This was the most interesting presentation of the day, but I didn’t know that when I rolled up my sleeves, cracked my knuckles and pumped myself up for the big battle. The presentation lasted 90 minutes, and all I did was nod in agreement the whole time. Dr. Sharon O’Brien from the School of Applied Language and Intercultural Studies at Dublin City University had saved the day for everyone in the room. We all could have listened to her for hours: no provocative statements, no condescending and humiliating remarks, just facts, facts and more facts. Results based on sound research that contradict everything everyone wants you to believe – especially the visionaries.

Time and time again she points out that not many companies have been successful at using MT. Those that are successful have their writers and translators on board and have involved their translators in the process from day one. Enabling the main players to take ownership of the process and carve out new roles for themselves is crucial to the success of MT. She also emphasizes that tight control is the key in every area that touches on MT and that quality issues have to be tackled at the source. This was really the best breath of fresh air I’d had in a long time!

After this extremely interesting and far too short workshop, the conference was closed by Jaap van der Meer. I have to say, the conference was almost worth going to just to hear what he had to say. Here are a few of the jewels I collected:

The 250,000 translators in the world aren’t enough to fill the translation demand, which is why we need MT. True if you believe that everything should be translated, including my English tweets. God forbid!

It’s time for the industry to define standards in the same way the banking industry did to facilitate electronic banking. What are TMX, SRX, XLIFF and TBX? Joost Zetsche wrote a report on file format standards under the banner of TAUS that provides a clear overview of their status. Looks like Jaap doesn’t even know what his company commissions. In my opinion, it would be fair to say that it is time to review, consolidate and improve the standards not define them. The comparison with the banking industry is also interesting when you think that it took some 15 years to develop a European standard for EFT and that MAESTRO, the development of which started in the 1980s, has been so successfully implemented across Europe that my Dutch bankcard still doesn’t work in many German stores.

Translators should fill the TAUS database and not be afraid. Why would anyone what to fill a database for a company FOR FREE so the company can use the database to SELL translations of dubious quality? If there is one thing Jaap did not address the whole day it is how TAUS intends to prepare the strings for entry in the database. Pre-editing did not fall once.

According to Jaap, machine translation will relieve translators of their stupid and repetitive tasks. I admit there is nothing more stupid than translating articles about the development of new sources of energy or breakthroughs in diabetes research. It’s pretty clear that Jaap thinks that translators do nothing but translate software strings and poorly written help texts. Well wake up Jaap – software is not the only market segment feeding translators! And believe it or not, there are translators who have never translated software and have no intention of ever doing so.

I went to this conference ready for a fight and I left the conference understanding what should have been clear from the outset: old business practices were being rebranded and given cool names such as crowd sourcing (pool of translators and editors), MT post-editing (proofreading and editing), and service (quality assurance). In that sense, the future really is here. But surely you didn’t need a visionary to tell you that?


Diane McCartney was born in California and raised in Germany where she attended a French-German school. She set up the translation department at ASK Computer Systems, where she used a UNIX program to prepare text for translation and review. Today she is based in the Netherlands and has been running her own company since 1997.

Sep 18, 2011

Twenty months of OTM

It's been a while since I've written about the translation business workflow solution I use and localize. Its pace of change has slowed in the past half year as the major improvements planned were all implemented, and I'm rather pleased with where it is now. When the pilot program for the Online Translation Manager (OTM) from began twenty months ago, the software was very specifically tailored to the workflows and working habits of a small group of companies. While the better parts of this early imprinting and a few interesting quirks remain, the working environment and communication platforms for customers and resources have been made quite adaptable to individual preferences, while still maintaining the provider's strong focus on good commercial practice, legal security of contracts and data privacy.

Problems that I have had after the last round of major upgrades in 2011 (and well before that, really) have largely been when I am so distracted by the pace of my working day that I "save" time by skipping simple administrative procedures that will cost me a few precious minutes. As with nearly any system, this often means more time spent sorting things out later, not a fault of the system, which at least provides me with options for catching up on what I should have done in the first place.

I have had quite a number of problems with my e-mail server in Florida this year. The reasons for the trouble are still not clear. Yet when I have used the secure OTM file areas for customers or the delivery e-mails with secure URLs, everything has worked even when my regular e-mail is hopelessly messed up. I have even found the system with its securely encrypted access for customers and subcontractors to be useful for passing sensitive information to colleagues appointed for review.

Let there be no mistake about it: this is a full-featured workflow tool for running a small to mid-sized agency (or larger operation perhaps). For the low SaaS subscription rates, one cannot expect the CAT and other integration options available in some alternatives, but I need those even less than I need many of OTM's sophisticated agency features.

I'm just a single translator, working alone or with an agency as an administrative backup partner, but with fairly defined, limited workflow needs. Yet I have found that this solution let's me 'punch above my weight' in many situations, especially when I offer a client a simple, safe way to receive or send highly confidential files and can guarantee subcontractors the same on those very rare occasions when I outsource and manage a project.

A number of my partners and colleagues have looked at the solution. Some rather liked it, others found particular "deal breakers", like the lack of support for HTML mail that can carry viruses (the developers are adamant about the security issues on this point). I've also had specific little "nice to haves" in mind that I still have not. But what I do have is an infrastructure that is far more powerful and robust than anything else I could afford and which takes very little of my time while giving a lot of time back. Not bad, really.

Your Cheating Agency

"The translators have no bread," said the buyer with some alarm.
"Let them eat MT," the head of Purchasing replied.
O globalised world. That has such translation services in't!
The people who brought you Each time we fire a “professional linguist”, our quality improves parts 1, 2 & 3 now offer the world the Cheating Translators platform, which is not a new social media venue for those of us with libido issues, but rather an opportunity represented to identify the use of machine translation in work received.

The result of this stunt analysis of my work is very revealing:

There's a link there on the "printout" to the bozo agency who dreamed this one up. The Germans have a lovely word for this. Unseriös. These idiots promote some sort of "cloud" based scam for slicing and dicing texts to be translated and doling the bits out like bread and soup in a charity kitchen to save us all from the horrors of life as independent translators without a Benevolent Protector.

This is just one of the many carnival sideshows to be enjoyed as the Great Profits Prophets do the l10n circuit of technological inbreeding to promote systems and strategies of questionable intelligence.

I wish them good fortune and godspeed. All of them. I hope the softened pitch for postediting paradise succeeds alongside the full frontal flashers of purely automated workflow. I hope all the big translation agencies of the world join the party and inhale deeply of whatever solutions get passed with the pipe. Really. I can't imagine a more delightful result for me, my business and those whom I honor and respect in the written language service professions. It would be delightful also to see VoiceMT implemented for the benefit of my interpreting friends.

I dream of a world in which the dirty carpet of service providers is hoovered free from those who don't belong on it as these are sucked safely into the bag to "support" machine processes. Those remaining will then be less distracted as they focus on processes that really matter, waste less time in conversation with fools (thus enabling the difference to be better recognized) and produce results that make a real difference.

Sep 17, 2011

Productivity for tossers or tossing productivity

For more than a week now since a respected colleague shared the link, I have kept an essay in a tab of my browser and referred back to it time and again, reading and reflecting on its content. Time to share and anchor it here so my tired old eyes can find it later and to make space in my tab menu now.

Zen Habits is a blog to which I am referred on occasion, and which I read with interest, but which I never get around to bookmarking due to a personal aversion for labeling anything Zen. A word which means so many things to many people is often useless for effective communication.

One of the reading experience of my high school days which impressed me most was a short story by Heinrich Böll, in which an office manager ran about in a frantic manner exclaiming "Es muß was geschehen!" ("Something must happen!") or words to that effect. Now I do not recall the title, and, as I have learned time and again, my memory can betray me as to the exact words or circumstances, but what is important here is not the accuracy of the citation, but the abiding effect of the text 35 years later, so I'm uninterested in sorting out the actual details with the help of Google. As I remember, the little man falls dead at the end of the story. Something happened.

Productivity is spoken of widely, often by me. However, it is a much more complicated matter than we usually acknowledge. Sometimes metrics fail or are simply irrelevant, no matter how well they are construed. I can sing the praises of Dragon Naturally Speaking for taking dictation to a new level or show you marvelous tricks with memoQ for handling complex content, but I cannot make the more important judgment of whether that activity or that text is really a productive, enriching use of your time.

"Toss Productivity Out" is a nice reflection on various truisms to which many of us hold dear. Read, enjoy and dispute its points.

Real life and productivity are about qualities, less about quantities. And as one sober colleague has noted, one should observe attempts to redefine quality with a very critical eye.

Sep 13, 2011

September-Übersetzertreffen in Birkenwerder

Liebe Kolleginnen und Kollegen,

unser nächstes Übersetzertreffen steht an, und zwar am:

                Donnerstag, 15. September 2011, ab 19.00 Uhr

Nach längerer Pause schauen wir wieder mal in den:

                Ratskeller Birkenwerder
                Hauptstraße 32
                16547 Birkenwerder

Bodenständiges Ambiente, zünftige Speisen und vernünftige Preise erwarten uns.
Unsere Nische ist diesmal leider nicht frei - da sind uns irgendwelche Abgeordneten zuvorgekommen.

Sagt doch kurz Bescheid, ob ihr dabei seid oder nicht.

Bis Donnerstag!
Andreas Linke

Das übernächste Übersetzertreffen findet wie üblich am dritten Donnerstag des Monats statt - am 20. Oktober 2011.

Sep 8, 2011

New joint venture for enterprise language services

(Oranienburg) Industry experts were surprised today as leading companies announced a bold new cooperative language services initiative. Soylent SemantiX GmbH combines the logistics, organizational service excellence of LionWorX Corporation, Poorina, PrAdZ, Googel and one of Germany's oldest human processing centers to harness the power that resides in the machine and the wannabe human translator.

Social reform advocates around the world welcomed the announcement for Soylent's plans to rehabilitate and reintegrate neglected resource potentials worldwide. Concurrent with the announcement of the company's founding and bold expansion plans to locations which include Calcutta (India), an undisclosed location in Myanmar and Harare (Zimbabwe), the new company's CEO, Kurt Kuhgel, revealed the signing of a pilot contract with the Texas state prison system worth USD 30 million. There it is hoped that the large numbers of incarcerated bilingual persons can be repurposed to lower translation and interpreting costs for the state government and local industry. Previous studies of this approach in the People's Republic of China indicated powerful synergies to be unleashed for the benefit of deserving parties.

Concerns that the new approach might undermine the status of current independent language service providers were allayed by revelations of the company's green strategy for translator productivity. At the same time, the implemented flow of work and resources will lower nutritional costs in the Texan program and its scale-ups at other locations. Further efficiencies will be achieved by fully leveraging the potential of the MT/post-editing workflows. The first workshops have already been scheduled in Waikiki, Hawaii to help processing units master the skills needed to surf the machine translation tsunami.

Companies wanting to be part of the future and optimize their language service costs today can contact the SoylentCare Center for more information and a free quotation.

Sep 7, 2011

Have you considered applying for the Certified PRO Network?

Dear test123,

From the information available in your profile, I see that you offer translation services in German to English, a language pair that is currently under-represented in the Certified PRO Network. In the first part of this year, 1312 jobs in German to English were posted at, but there were only 104 members in the Certified PRO Network to respond to them. As you can see, then, this might represent an excellent growth opportunity for you.

If you would like to be included in the Certified PRO Network and increase your opportunities to meet clients at while distinguishing yourself as a professional, please complete your application here:

Make sure you also watch this short video on how to complete your application before submitting your application for review:

Find out more about the Certified PRO Network here:

Looking forward to welcoming you to the network!

Kind regards,

Maria Kopnitsky Certified PRO Network
The translation workhouse
------------------------- Headquarters
P.O. Box 903
Syracuse, NY 13201 USA

The preceding letter was copied exactly from the mail forwarded to me by a former ProZ moderator. The account "test123" is a testing profile set up years ago to explore certain new features of the site. He and I are both amused and puzzled by the invitation for a nonentity to join the fabled red pee network. If it is in fact true that there are only 104 "approved" red bubbles (as he calls them) for a language combination as common as German to English, it makes you wonder what's up. I know there has been a huge exodus of talent from the platform, especially in European countries where the disregard of data privacy and security concerns runs strongly counter to laws and expectations. In fact, a lot of the best translators I know for German and English are no longer ProZ members, and outsourcers who restrict their listings to the membership or the red bubbled ones are simply shutting themselves away from a lot of the best talent now.

It's also clear that these invitation processes are not personally reviewed by those responsible. I really can't imagine that Ms. Kopnitsky believes that "test123" is even a serious alias. Many of us now use aliases on PrAdZ to keep the search index trash for repetitive content in Tamil and god-knows-how-many-other languages from burying real content we rely on for professional publicity. But "test123"? It takes a special kind of faith to believe in that.

Mao's Revenge

That e-mail like that shown above is perhaps offering me some irresistible bargain on an herbal Viagra alternative made of rhino horn or monkey feces. Hard to tell, though, since I don't read Chinese. Nonetheless, the hardy spammers behind The Great Firewall send me such missives daily, a few dozen times a day. The only result so far has been to increase my natural suspicion of doing business in any form with mainland China. Despite my best efforts to view such transactions neutrally, personal reports over the past three decades from those who have visited clothing factories proposing to make wet suits for a Cousteau company and found workers chained to their sewing machines, adhesives specialists describing the utterly irresponsible, uncontrolled use of solvents in production, factory managers shot for production problems in a refrigerator plant and more than my little brain wants to remember make me question the greedy, headlong rush to outsource everything to the Middle Kingdom and then let it buy up our assets at home with the profits.

Although I wish I knew how to block trash mail like this, for me it is also a useful reminder that we need to choose our business partners carefully. The entry of Chinese LSPs into the service market for European languages in combinations not involving languages in China's region has done nobody any good. Not even the Chinese I suspect, because they are guaranteed to get only very mediocre results at best with the rate structures they favor. I do know one very respectable German colleague who would contradict me on this point. I haven't discussed actual euros and cents with him regarding this matter, but I'll concede the argument to him to this extent that his partner in China is an exception, like the Indian service provider who once paid me about 22 euro cents per German source word for a patent translation. A good partner can be found anywhere in the world, but when confronted with masses of data, a tsunami of cooperation proposals and the potential risks and pitfalls of cross-regional transactions, for those in the US and western Europe, it is usually a complete waste of time to talk translation business for major, non-Asian and non-Slavic language combinations with service providers in many parts of the world. That includes China.

So I have a friendly request for the Party censors there. Please take a little time in your day, stop trying to track down bloggers, artists and others to jail and harass for their "subversive" opinions, and do something to improve your country's image by catching its spammers and subjecting them to public trials (with competent simultaneous or consecutive interpreting, broadcast live on the Internet) and subsequent public execution. If you do, maybe I'll even consider doing a free test translation from German to English for you.

Translation industry survey until September 23rd

In preparation for the TM Europe conference in Warsaw at the end of this month, Peter Reynolds is once again conducting a survey on the translation industry, standards and tools. Respondents include translators and interpreters, consultants, translation agencies and other intermediates as well as translation consumers (end customers). Those completing the survey will have access to the results of last year's survey - an excellent report with significant value as market intelligence for any of the aforementioned groups. I was personally surprised by how much useful information I got out of the analysis.

So if you haven't taken the survey yet, please take ten or fifteen minutes to do so now before September 23rd. Like any survey I've seen, there are questions I might have structured a little differently and items I think belong in a list (but adding them is usually an option). Nonetheless, it is well designed, and the workup of the data is very good.

The 2011 survey is here:

Sep 6, 2011

Stating conditions

Over the course of the past year I've worked from time to time on a post on the importance of general terms and conditions for freelance translators as a tool for avoiding trouble or defining procedures for problem resolution. But the hurly burly of daily life and business has kept me from finishing that particular article as well as my reluctance to publish on the subject before I get around to translating my own T&Cs from German into English.

I do regret that. So many of the "problem" threads I read in public forum are matters where clearly stated terms and conditions might provide some guidance for a better resolution of a dispute. And today I got a little request for an opinion from a friend who might have been helped by such a document as well. In his case, he quoted a job to an end client for the translation of what he thought was a two-page PDF. The customer was some industrial client who requests translations in various languages on an occasional basis, some of which are in language combinations he outsources. The little job was outsourced, translated and delivered, presumably on time. Alles klar, as they say here. But not quite. There was a second page with quite a few more words. It was overlooked in the PDF. And the customer pointed out that this extra text should be translated at no additional cost, because a quotation had been accepted in the belief that it covered the entire document.

I don't know what the ultimate resolution of this will be for my friend; the costs involved are annoying, but not back-breaking if he ends up eating the charges for the second page. But I thought to myself "this is a case where terms and conditions should clearly reserve the right to amend quotations where serious errors are found". And of course he has no written terms and conditions for his business, even though he's been in this game more than twice as long as I have.

It's not enough just to have T&Cs, of course. To apply them to a business transaction, you must make them available, perhaps as an attachment to a quotation, or as I often do making them part of my e-mail signature block.

There are so many issues that come up routinely that can be dealt with this way. Rights to rework an order if deficiencies are found. Limitations of liability to the value of an order except in cases of malice or gross negligence. Interest on payment in arrears. Whatever is important to you and your business can and should be covered in your general terms of business.

But one thing I do not recommend. Often in T&Cs and contracts, I read some nonsense about a particular language version being "authoritative" and translations being for information purposes only. That's bullshit. It may work in some jurisdictions, but certainly not in all. And it is simply insulting. If you can't trust the quality of your contract translation, maybe it's time to stop shopping for language services from linguists with tails. Offering a customer binding terms they can understand (in a language they read well) is a matter of basic respect and in some places a precondition for having an enforceable contract.

Sep 4, 2011

Amazing rates

Sometimes, contrary to the frequently cited rule, collective intelligence is indeed greater than the sum of its parts. Networking with colleagues can often be a source of positive inspiration for dealing with complex, emotionally charged situations.

Many independent translators take umbrage at the modest rates proposed by enterprising LSPs committed to the cornerstone principles of capitalism. Inappropriate responses, such as the No Peanuts! movement, abound, giving the impression that we are whining, ungrateful wretches standing in the way of important evolutionary or devolutionary processes.

Fortunately, there are those among us who take a positive attitude toward the amazing opportunities with which many of us are confronted. Here is a response one colleague recently shared on Facebook, presented here as a template for your use:
Thank you for your interesting inquiry concerning translations from [source language] to [target language].
The proposal of [fill in ridiculous rate] for translation and [fill in other ridiculous rate] for proofreading is more than flattering. However, I do not think my humble work is worthy of such lordly compensation. Normally I charge about [even lower rate]. This client-friendly rate is only possible thanks to the fact that I work about 18-20 hours a day so I am still able to pay the rent. I enjoy working long hours for a good cause.
However, for this opportunity I would like to suggest a colleague of mine, Kurt Kuhgel, who could accept this generous rate. You can find his website under this address:
I hope for your understanding.

Please note that the URL in the template is adapted for my own language combination (German to English) and may be modified for your purposes by substituting the relevant language codes.

Sep 2, 2011

Bringing XLIFF content into SDL Trados Studio

When SDL released its new flagship product, SDL Trados Studio 2009 a few years ago, one encouraging feature was its use of a real standard... after a fashion. XLIFF has become widespread in recent years as a standard for exchanging translation project data, and SDL decided to "extend" this standard by adding their own unique tweaks to it. Nonetheless, a number of other tools are able to read and translate SDLXLIFF files, even if segment status marking is not performed for the deviant format. I have translated a number of these SDLXLIFF files generated by SDL Trados Studio 2009, and there have never been any problems.

When I was asked how data might be moved from an old memoQ project into the new Trados version, I naturally thought of using XLIFF as a simple option and an alternative to filtering and exporting TMX from the translation memory. Once it worked in fact, but too often error messages like the following were seen:

I was utterly baffled by this one, as were the project managers at the agency using the SDL tool. I could not understand why importing an XLIFF worked on some systems some of the time but not on others. Then once again, my favorite internal information source at SDL explained that there is a bug in the XLIFF import filter that yields this error if the default source and target languages are not in the same group as the languages set in the XLIFF file. In this case, the default settings were:

The source language in the XLIFF file I attempted to open was German.

There are two ways to deal with this problem if you need to import an XLIFF file from another source into Trados Studio:
  1. Set the source and target languages in Studio to something in the same language group as the source and target language of the XLIFF file. In the case of my XLIFF file, generic German and English were set, so when I changed the defaults in Studio to DE-DE and EN-US, the file opened, but I was warned that the language abbreviation in the XLIFF file was not "fully qualified" (a common German obsession). But still, it worked, and the content could be edited or fed to a TM.
  2. Another way to deal with this if you are aware that Studio will be involved is to use sublanguages in the environment that generates your XLIFF file. In my case, that would mean setting the memoQ project to DE-DE and EN-US rather than just DE and EN for the source and target languages. Then SDL Trados Studio will identify the languages in the XLIFF file correctly and open the file without warnings or errors.

Sep 1, 2011

Has Kilgray jilted freelance translators?

The latest post on the Kilgray blog is a strange one, though not entirely unexpected. It's an attempt to answer recent criticism that the company cares more about its corporate clientele than freelance translators. I've heard concerns about this for some months now, though honestly more concerns from within Kilgray than from the world of translation at large. I take the fact that the company's board even worries about such things to be a sign of the basic health of its concern for freelancers. I mean, really, can you imagine the board members of SDL losing sleep over a few grumbling translators?

I've heard a lot of internal concerns from members of the Kilgray team that the many features which have been added to memoQ since I began using it a bit over two years ago have made it harder to have a full overview of everything the product can do. I would actually agree with that, but it doesn't worry me any more than the fact that I use perhaps 10% of Microsoft Word's functions: I focus on what I need and ignore the rest. I know it's there and if something becomes relevant to me later, I'll learn about it then. Like the Star Transit project import feature. I hadn't done Transit projects in years and gave them no thought any more until Monday when an old client called with an urgent request. So I spent 5 minutes learning to use that memoQ feature and made my rent and then some once again with a day's work in an unusually slow week. You tell me if that's ignoring the needs of a freelance translator.

Kilgray's pursuit of LSP and enterprise business has in reality been a godsend to its freelance translator base. As major clients like the German post office adopt memoQ technology, its perceived legitimacy and the legitimacy of those using memoQ increases among potential clients. Even freelance translators using Trados have benefited considerably from Kilgray's user support and innovative skills as SDL has been forced to clean up its act in many ways and make substantial improvements in its product and support due to competitive pressure not felt before.

Corporate sales underwrite development of features interesting to freelancers in a way that individual "pro" licenses never could, and that is a good thing. Just look at the disaster of Atril over most of the last decade as the product which was once the most versatile, innovative tool available to freelancers languished with almost no development for years, and many users truly expected the Second Coming before the release of DVX2, which finally came out several years after the announcement of its imminent release. Kilgray is innovating and adding features that are directly useful to me as a freelancer at a pace far beyond my ability to keep an overview. Without making memoQ any more complicated in my daily routine! If corporate money helped pay to develop that superb Transit compatibility feature or the bilingual RTF table output I use almost every day, then I can only be grateful to Kilgray for having the wisdom to seek balanced development of its business in all areas of translation service.

But are they ignoring freelance user support requests and giving all the attention to big-ticket customers? I don't think so. I hear time and again from friends with no personal acquaintance with the Kilgray team how quickly and competently their questions are answered. Sometimes there's a little back and forth before the issues are understood clearly, but that is normal in any human interface in any company or industry.

A quick look at September's schedule of TEN free webinars by Kilgray shows a good balance of interests, with a number of presentations for all interest groups. Nobody is being ignored nor is any group receiving an undue share of attention.

Misunderstandings are inevitable in human interactions, even among intelligent people with good intentions. We are all preprogrammed to misread what's right in front of us rather often, because we cannot help but look through lenses of experience that is not always positive. Having been disappointed very often in my dealings with many companies offering products and services, I can be positively hairtrigger with respect to some organizations and unleash a torrent of criticism that may not be fair in a given instance. I look very closely at Kilgray and what it is up to, and I have done so well before I began using the company's products, which I was very reluctant to do. But in the end, the team there won my trust, not with its often superior technology, but rather by its ability to admit mistakes, treat all its users with respect and always try to do better. They don't get everything right all the time, but their basic good sense and desire for stability and balance have carried them very far and very fast in recent years as freelancers, LSPs and enterprises recognize a partner who really can be trusted, one with no hidden agendas or corporate divisions looking for a wedge to sell its translation services to your customers.