Translation Tribulations: Coming clean on dirty machine translation

Jun 22, 2014

Coming clean on dirty machine translation

Click the tweet shot to decode his babble

Well, well. I started to dream in Portuguese two nights ago, and it happened again tonight in a dream where I discussed the sentimental value of some old items on their way to another life or recycling. One of these was the first color computer I bought and why; no idea why I should be telling my daughter all these things in Portuguese. I'm supposed to write a letter in English to minha noiva and have it translated into Portuguese by a friend to be sure that no meaning is lost, because I'm too far from mastering all but the most primitive grammar. I think maybe instead I had better just do it in the language in which it will be received, and message will come through as well or better than any translation, though certainly it will be checked. Maybe not. We'll see.

I responded to the incessant, poisonous spew of Luigi Muzii (@ilbarbaro) on Twitter last night, because the little puffed toad obnoxiously insists on croaking nonsense in debates (mostly with himself) which the dim bulb of his mind can never illuminate, and he does so in a tortured, incomprehensible and of course incorrect English which leaves readers I know unimpressed and utterly baffled, which has made him a frequent poster child for nonsense examples at conference presentations and which has convinced some that the man knows no English at all and merely machine translates his disordered thoughts from Italian.

Communication is seldom about the correctness of language or the degree of its mastery. Certainly it can be useful for some of us to command the subtleties of grammar, and I'm one of the guilty who enjoy that fine edge to carve patterns which will sometimes be appreciated by almost none. But sometimes the most eloquent expression can be in the most broken speech, supplemented by tone and gesture and scribbles on paper, signs in the air. And the howls of a dog. I realized this last night as I sat at a table with my Portuguese tutor and one of her many nephews, telling and understanding jokes and completely at ease in their language and culture in our negotiated register, where two weeks before I could do little more than say my dog doesn't bite, order 200 or 300 grams of anything at a butcher's counter (pointing at the item) or perhaps get half a dozen eggs, coffee and some pastry. I remember the eloquence of a Greek mechanic who shared tea with me on the floor of his shop years ago and told wonderful, funny stories I understood and laughed at though I knew about five words of his language.

The desire to communicate and to understand in ordinary situations of interaction is often a more effective facilitator than technical skill. Sometimes a friend and/or colleague will call my attention with some outrage to a web page or a message with "horrible" errors and I look and see none, only fluid expressions of thought and meaning or at least a fit-for-purpose text. A computer program has no motivation, no matter how great the motivation of its creator. It can have adaptive, event-based routines, but these are seldom adaptive in the way we know for the least of human minds. The messaging of machine pseudo-translation profiteers and their snake oil sidekicks pushing a fix of crowdsourcing, rightsourcing and workflow is quite adaptive to hide the static concepts and rotten nature of the repackaged Gammelfleisch they sell in pretty packages to hungry cost-cutters.

The MpT talking heads, Friend Muzii among them, have turned up the volume of their megaphone marketing lately, offering HAMPsTr'd hope to translation buyers that the lapis philosophorum sold by language carnival barkers can transform merda to gold with just the right six- or seven-figure engineering investment and straightjacketed expression we call controlled language. They babble and bark of so-called professionals who are "scared" but it is those unprofessional and MpT charlatans who are running scared at the thought that, like with the naked emperor in the story, their glorious equipment will be revealed to all and found to be of more limited use and interest than most might imagine.

I use machine pseudo-translation (MpT) every day, effectively, to aid in many critical tasks, and I see great value for it in its proper place. But what is that? Certainly not what the greedy HAMPsTr'izers say it is as they seek fresh mental sacrifices for their unholy altar. I believe there are a number of excellent, honest and profitable applications for MpT processes, and I know some translation agency principals and others who profit clearly and honestly from them, and I can find few points of disagreement with these people. But they are also not the more prominent Jungle Book characters on the international scene singing sweetly "Trust in me...."

Come to the IAPTI conference in Athens this September and hear my confession of how MpT technology has worked for me. Or better yet, go to Athens, skip the conference, get drunk on ouzo and tell the natives how much better their lives will be thanks to the transformative powers of MpT.

Please note: no underage girls were anesthetized and abused in the making of this blog post about the technologies and advocates of the bulk market bog (BMB)!

28 comments:

BiscuitJune 22, 2014 9:02 AM
Now I know what your confession will be about, in September... shame... no garlic!
ReplyDelete
Replies
BiscuitJune 22, 2014 9:17 AM
sorry, I used GT to translate your article into Italian... I thought it was strange it took place in Grecian 2000...
ReplyDelete
Replies
BiscuitJune 22, 2014 11:24 AM
Bad jokes apart, it would be interesting to know which specific PEMT systems the big turds are offering to their clients... it seems to be a highly guarded secret, which is understandable... maybe I should pose as a client and see what they come up with... :-)
ReplyDelete
Replies
BiscuitJune 22, 2014 11:42 AM
we'll have a look, thanks... I suggest you install bug zapper in Portugal too... according to GT, it's called a "bu zapper" in Portuguese...:o)
ReplyDelete
Replies
Dan NewlandJune 22, 2014 2:26 PM
Thanks Kevin. Wickedly brilliant as usual.
ReplyDelete
Replies
John MoranJune 22, 2014 4:54 PM
I am sure those numbers didn't come from me Kevin. I would never be so indiscrete (NDAs and such). But seriously at least they had three engines standing. Hub* and iOmegaT was a pretty powerful two-prong attack. HP are now using iOmegaT to cut through the MT provider malarkey (to borrow from Chris D's vocabulary) and there are more big buyers to follow as we integrate with WorldServer. The odd thing is that IBM figured this out years ago but the CAT tool developers didn't notice. Luckily, it might create a situation where we solve the autosuggest problem (helps some but not others).

I'm finally starting to write the damn thesis but I keep vacillating between the title "User Activity Data analysis as the basis of a framework to test computational linguistics technology in CAT tools" and "iOmegaT a samurai sword to cut though the steaming pile of horses##t that certain MT providers have been spouting for years about productivity improvements from MT". 150-300% my arse. You might as well say 3000% if it light PE on User Generated Content.

Here is one for you. What kind of person puts a picture of a turkey beside six thousand words per day? An ignoramus who has never translated 6000 thousand words in a day and not been able to sleep because his brain is still racing I suspect. I couldn't believe it when I saw him present it in Limerick a few years ago.

http://www.asiaonline.net/images/WordsPerDay.png

That man personalises technology arguments online (yes, mate, I am still pissed about the reference to the call with PK about me) and intentionally dehumanises the translation process with insulting references to animal figures. It is the same technique used to spur people to genocide. I am not kidding - take the fucking thing down before Vancouver or it will become part of my iOmegaT slide deck as an example of the distain the MT cottage industry is showing to the people who do the actual work that pays for their self-congratulatory conferences.

Hub* = The shorthand name used by MT engineers for Microsoft Translator Hub and the bane of all but the best MT providers. 1 million characters for free of trained relatively good pseudo-translation per month (especially for IT material).

By the way, I agree. That Italian guy sounds like a moron looking for attention. Translators are about as scared of MT as they are of fuzzy matches or translations done by junior translators / the crowd.

Apologies if I sound Smaug but I know something the MT providors don't want the translator community to hear and - most importantly - I have a travel budget. Roll on 2015.
ReplyDelete
Replies
UnknownJune 23, 2014 4:49 PM
I could be wrong of course, but the recent increase in volume from the MT/technology-abuse proponents and their recent personal attacks on the profession seem to me to stem out of distress, not power.

Technology and businesses that base their sales on fallacy and FUD, and resorting to demagogic attacks on the profession they dare to claim to represent might be in a less competent position than what they would like to lead others to believe.

There are technology developers that speak honestly about the technology, its limitation, and proper use cases (i.e. being a tool that cab create win-win situations; not a platform to support the greedy business model of irresponsible intermediates); even the less-than-ethical bunch provide sneak peaks into what is really going on behind the scenes of the demagogic propaganda attempts that are masked as "opinion pieces" in outlet media, and the stream of brainwash attempts in social media. There is enough information to build a more rounded picture instead of falling into the trap of those who are simply more vocal.

Whenever I'm engaging in a debate about technology abuse, I'm always encountering predictions, estimations, FUD-driven claims and statemented, personal attacks, but very little substance behind it all. I'm always have the urge to sign off my part in the "conversation" by saying "The pot calling the kettle black".

Translators should join the MT boat or drown, Translation are scared, or the recent Why So Many Translators Hate Translation Technology
(by Nataly Kelly of Smartling: could you thing of any conflict of interests? Yet, there isn't even a disclaimer) represent the evolution of this propaganda and how it became more personal, false, and vile with time.
ReplyDelete
Replies
Dion WigginsJune 23, 2014 7:34 PM
Dear John and Kevin. We have been challenged several times on the numbers that we publish about our customer’s productivity gains. However, these numbers are not made up marketing speak, they come directly from our customers reports. We do know that 150%-300% productivity gains are much higher than what our competitors are achieving, which is exactly why we frequently have our customers performing their own metrics and presenting them in their own words. Our website has many such case studies with webinars that have customers speaking in their own words that make it clear that this level of productivity is now occurring on a regular basis and is not an exception.

Next week we have a webinar from one such customer, Hunnect, that achieved over 300% productivity gains on English to Hungarian in both the IT and Life Sciences domains. As you know, Hungarian is one of the more complex languages for MT, so this may surprise you even further. Hunnect’s team went from 250 words an hour with human only translation to 900 words an hour post-editing MT which is a 360% productivity gain. This is partly attributable to the quality of the MT and also partly to the post-editor training that Hunnect developed. Sándor Sojnóczky, the Managing Director of Hunnect, will be speaking in his own words and be presenting a number of projects where he did achieve the productivity gains that you are calling into question.

I would like to personally invite both of you to hear directly from this customer and ask any questions that you feel are appropriate to put your doubts to rest. The registration page is available at https://www1.gotomeeting.com/register/987382297. Additionally, if you would like introductions to some of our other customers, I would be happy to make these introductions so that you can further validate the productivity levels that we and our customers refer to.

You may also wish to review some of the other case studies with customers that are achieving similar results and talk about it in their own words. This is available at http://www.asiaonline.net/EN/Resources/Casestudies/Default.aspx

Kevin, you have previously been quite outspoken on our technology, but to my knowledge you have never tried using it. So I would like to also offer to you a free trial where you can make your own judgments based on the results. Kilgray memoQ has added productivity metrics in the latest version, so you can even measure it from one of your favorite environments. If you take us up on this offer, we will take you through the Language Studio customization process and you can see for yourself how we customize and refine an engine to deliver the productivity levels that our customers talk about.

Finally, the speedometer used in some of our presentation uses images of animals as a metaphor for speed. The original idea came from this wallpaper (http://screen-wallpapers.com/wallpapers/view/1954) and we validated the speeds against actual animal speeds from several websites. There is no other intent here other than to show a graphical metaphor for speed.

Regards

Dion Wiggins
CEO, Asia Online
ReplyDelete
Replies
Charlie BavingtonJune 24, 2014 3:09 PM
I should confess, I was intrigued by Luigi M's Twitter teaser, and to some extent took the bait, in the sense that I replied to it, agreeing to some extent as regards my interpretation of its basic premise. Then came the blog post. It was lucky, frankly, that he'd tweeted some kind of summary to me - the bloke really does need an editor!

That said, I do k-i-n-d o-f see what he means in the last paragraph, as a (part-time, i.e. doubtless relatively ill-informed) observer of the goings on.

And yes, I am guilty of double standards. Here, I wonder if the message should not be viewed in isolation from the messenger. But in the case of Ms Kelly's article mentioned above, I was one of those (including on a short-lived p**z thread, now gone, seemingly at Jeff Whittaker's request, he being the one who started it, which adopted an early and vigorous "anti" stance) saying "yeah, but look who's saying it and look what else she says".

Anyway, moving on... Kevin, if and when you do get involved in any challenges with Dion Wiggins, check the bloke's maths first. 900 is 360% of 250, true. But in terms of increases in words per hour, while impressive, it's only actually 260%. It's a common enough mistake (the *increase* is 650, which is 260% of 250), but worrying from a man in his position, for all sorts of reasons.

ReplyDelete
Replies
Aurora HumaránJune 25, 2014 1:58 AM
@ Shai Navé: I couldn't agree more with you. I even see this as unfair competition because her title (and many people only read titles) highlights a lie. Smartling is unfairly competing with translators by depicting us as Luddites vs the smart guys (Smartling).
ReplyDelete
Replies
Loek van KootenJune 25, 2014 2:11 PM
It is very well possible Smartling did not have the interests of translators in mind when they started their business. However, there is no doubt that no matter the past, Smartling is changing its course and focussing more and more on human translation. I'd follow their website very closely, as a huge update is about to take place with new texts that strongly encourage human translation at the cost of machine translation and crowdsourding.
Possible altruistic reasons aside, Jack Welde told me personally that also from a business point of view, human translation is more interesting for Smartling: they get paid per pageview, and it just happens to turn out that high-quality translations give more pageviews than low-quality translations. Using crowdsourcing and machine translation instead of human translation therefore would be a stupid decision business-wise.
I believe in Occam’s razor. I don’t believe in conspiracy theories saying that Smartling is about to take over the world at the cost of translators. As they have declared themselves, these days they are basically selling a CAT tool. That puts them in the same league as other CAT tool makers like SDL, Kilgray, Atril and Across. I don’t believe CAT tool makers are evil, not even if they implement functionality for machine translation in their software. Kilgray did the same thing and nobody hates them for it. Stupid, yes. Evil, not. Smartling is encouraging its users to use human translators only, but if a client decides to use machine translation after all, that won’t stop them from selling their product. And why would it?
Indeed, there's an obvious angle in the article trying to sell Smartling as a platform, but if trying to sell something is a sin, then we are all sinners. Because no one here is posting just for altruistic reasons. All posters here want to have a high profile so that people remember our name whenever they need us. I agree the column is not neutral, but then again I don't know one left-wing or right-wing newspaper that is.
I’m definitely not a fan of Smartling’s user interface and would charge more than the normal rate to work in it (it’s still a toy compared to say memoQ), but I also understand how this company is filling the void in the market by addressing the needs and pains of website and app developers every LSP and translator has failed to address yet. This software enables website owners to localize their website without any preparation or whatsoever and distribute its translated contents worldwide so that it loads fast no matter from where you access it. For translators the software provides a real-time in-context view so that you no longer need to guess the meaning of isolated words and strings. If Smartling manages to get the translator UI up to par, they have a product everybody should watch, like it or not.
Sure, some parties won’t like the idea of giving Smartling full control over their website and contents, but then again, Smartling is not for everyone. Being the control freak that I am I wouldn’t use their service personally no matter how good it is, but quite some developers disagree.
Now I’ve been slaughtered already on several forums for merely being realistic. If you want Smartling to be the evil empire that kills little translators to make a living, then please be my guest. I think Smartling is here to stay though and that a little extra competition on the CAT market will only benefit us. As said, at this moment the translator’s user interface is still a major PITA, but watch this space. They may improve it, and then Smartling may have become a reality. Maybe it already is. (continued)
ReplyDelete
Replies
Loek van KootenJune 25, 2014 2:11 PM
(continued) Smartling and I made a very bad start, but currently they are a client of mine. Not a very big client, but still a client. I have always been very open about that, because I want you to know exactly who is feeding me, so that you can weigh my words yourself. I’m not a fan of their software yet, but I refuse to judge people on things that may or may not have happened in the past, and I refuse to be part of butchering people in public just because that’s the fashionable thing to do.
I believe there’s also a middle road. I say: watch them and see. Let them prove themselves and see if they can get rid of their image of crowdsourced and machine translations. We have nothing to lose and everything to win, and I am 100% sure that even in a worst-case scenario good translators have nothing to fear from whatsoever technology or company: good translations will always be easy to sell for high prices. Crowdsourced translations and machine translations will only win if those are better than your translations, and if that is the case, you don’t belong in this business anyway. And even if you believe all doom scenarios and think that somehow Jack Welde can make your translations worse than they are, you give the man way too much credit, fighter pilot or not ;-)
ReplyDelete
Replies
UnknownJune 26, 2014 6:35 AM
Interesting discussion. Here are my thoughts on translators and their technology: http://foxdocs.biz/BetweenTranslations/why-so-many-translators-love-translation-technology/
ReplyDelete
Replies
Kevin HendzelJune 26, 2014 6:29 PM
Here's what I posted on the same subject in response to a comment by Chris Durban in another forum today:

I think it's helpful to keep in mind that Smartling's Nataly Kelly is serving a decidedly different agenda when writing blog posts (it's not an article) with absurd and completely bogus titles like "Why So Many Translators Hate Translation Technology."

Aside from being wildly inaccurate, it's belittling and insulting to a whole class of professional translators who have worked very hard to leverage translation technology -- largely TM technology -- in productive and thoughtful ways (in markets where such technology makes sense) for decades.

That includes pretty much everybody reading this post right now.

What I can say with some degree of authority is that "so many translators" were early adapters of new translation technologies when Ms. Kelly was still in grade school and long before she became a telephone interpreter (!!), a linguistic skill set that misses by a country mile the experience necessary for making informed judgments about language technology.

I also thought it was instructive that far more experienced and knowledgeable translators with hands-on experience in translation technologies successfully challenged and refuted her points one-by-one in the comments section to that blog post, including John Moran, who's completing a PhD on MT and human translation in the UK, and with whom I've spoken at length and who shares my view that automated voice recognition is the real disruptive technology more so than MT as commonly understood, as well as Jayne Fox, Jon Johanning, Valerij Tomarenko and Aurora Humaran.

The Smartling agenda is to paint translators as whining, uninformed, quivering-in-the-corner technophobes who need to be rescued by Sophisticated VC-Funded Technology Companies With (What They Think Are) Clever Names. The target audience for this article is Smartling clients, not translators, of course.

It's a sly attempt to turn the tables on where the real power lies.

Smartling wouldn't last to the end of business today without technology-savvy professional translators, but those same technology-savvy professional translators don't need Smartling at all.

And of course there are many professional translators who actually prosper by working in markets far above Smartling's audible hearing range.

I suppose one can write whatever one wants as long as the VC money flows in to keep you afloat -- and people forget that your claim to language expertise is as a telephone interpreter -- and you don't need to even pretend to survive on the open market where a P&L actually determines whether you survive or sink. :)
ReplyDelete
Replies
Dion WigginsJuly 02, 2014 4:35 PM
Dear John,

I note that you have not accepted my personal invitation to today’s webinar. In case you forgot to register, I am sending a reminder. If you attend, you can talk directly to one of our many customers that have achieved productivity improvements well above 150% and some engines have even achieved above 300% productivity gains. You can register at https://www1.gotomeeting.com/register/987382297 if you wish to attend. If you cannot make the webinar today, the video replay will be online in a few days.

If you would like to talk to this customer directly after the webinar, I would also be happy to arrange a call. We have many such customers and I would be happy to introduce you to them so that you can ask any thing that you wish about their experience with our products and services in order to put your doubts at rest. By talking to our customers directly, you do not have to take my word for it. You can see for yourself that these productivity gains are very real and that our customers achieve them on a regular basis. Please let me know if you would like me to make these introductions.

Regards

Dion Wiggins
CEO, Asia Online

ReplyDelete
Replies
John MoranJuly 15, 2014 4:03 AM
Dear Dion,

Thank you for the invitation. I am afraid I am only reading the remainder of the discussion now so I could not participate in your webinar or talk to your client.

However, I am afraid I am unimpressed by your offer. You wish to introduce me to a client for whom MT facilities a 200%+ translation speed improvement over translation from scratch. How do I know if this is an outlier or a norm? All the data I see suggests the former and I see a pattern in your statements that suggests you confuse the two.

I accept that MT can improve translator productivity. As you know I develop software to measure exactly that using a technique I call Segment Level A/B (SLAB) testing* (google iOmegaT). There would hardly be a market for such a tool if it didn't. I even accept that MT can even do so without measurably impacting on quality when translators rewrite sentences they deem unacceptable. I'll do you one better and say that it can help improve consistency as a sort of parallel concordance if you admit MT introduces risks with regard to style. But sure who cares about that these days? Good writing style is soooo old fashioned.

Here is what would impress me:

1) I am also aware of a number of companies that started with Asia Online and found that Microsoft Hub delivers them the same or better improvements with no "consulting" using the same training data. Ditto for Kantan.

As MS Hub is free for the first million characters consume its service in the editing tool you provide your customers. Provide them with a feature where in some segments translators post-edit Hub output and in others AO output as a blind test. Record their working speed over a longish period in seconds per word. In short allow them to see the utility of the MT your company provides relative to that of others. I have access to the AO system so I'll check back in a few months to see if such a feature has been added.

On your website you warn of the dangers of upload and pray. I am willing to accept that consulting based MT can be a good solution for very large volume work and in other limited circumstances but, again, we are talking about outliers and norms.

2) Reduce this risk for your prospects by provide free samples of AO output via an API for all of your engines so consumers can judge for themselves whether your core system (prior to the addition of their training data) is worth building on. It needs to be an API so they can quickly compare it to others that do the same.

3) Introduce me to a company that got its money back if a promised 150% improvement was not forthcoming following a consulting engagement. This is the lower end of the large graphic on your home page so surely it is the minimum they can expect. I know a few that didn't (though Asia Online is not alone in that regard).

I am all for getting on a boat to save my teeny-tiny little LSP from drowning but, to paraphrase another Irishman, I think I'll pay the ferryman when he gets me to the other side.

As you would see if you had attended my talk at FEISGILT prior to LocWorld, the problem I am trying to solve is that most of what you say has to be taken on face value only because the CAT tools publishers are not publishing data on true MT speed improvements. Here is one exception:

https://www.youtube.com/watch?v=KViuQLcyq0c

Scroll to the end to see a 5-10% cost saving. Not very close to 150% is it? Is that because they were not Asia Online systems? More info would be nice. What I would like to see is a breakdown of language pairs, domains, providers etc. How would this infringe on an NDA? It wouldn't...
ReplyDelete
Replies
John MoranJuly 15, 2014 4:04 AM
...I am perfectly well aware that your intent was to use animals as a metaphor for speed. The problem is you used a pig and a turkey. While I personally think these are useful and tasty animals (particularly together in a sandwich with mayo and lettuce) in the English speaking world they are commonly used as metaphors for negative sentiment. You have placed them beside daily word counts numbers that are much higher than most translators achieve (except with Dragon Naturally Speaking). It betrays a lot about the way you think of the people who ultimately pay your salary. I have noticed this and we communicated privately about this in the past. That exchange and the way you shouted down that guy who was pretending to be a translator even inspired me to set up a LinkedIn group (Technology Agnostic Translators) so translators could talk about MT in an environment where no MT vendors were present.

One final point, I should not have used an expletive - even if I thought you were comparing me and my fellow translators to a pig / turkey. Sorry Mum!

*When I told the Head of CNGL (the 170 person language technology research organisation I work for) I was calling the instrumentation format in iOmegaT CAT-UAD (User Activity Data) he winced and said I needed something that rolls off the tongue better - like XLIFF. So now we have HT/MT SLAB scores. C’mon CAT tool publishers! Let's see 'em! Publish or perish!

John Moran
Research student, CNGL
CEO, transpiral.com
ReplyDelete
Replies