Jul 26, 2013

The trouble with voice recognition in translation environment tools....

I had not planned to make a video on voice recognition tools any time soon, but a few remarks by my American colleague Kevin Hendzel well down in the many comments about thepigturd's letter to translators sort of goaded me into it. I thought, "What the heck, I'll just grab some text from Wikipedia, record a bit of the work with Camtasia, and post a quick demo of how easy it is to work with Dragon Naturally Speaking." So I got a text about chickens. And activated the screencast recorder. And then the trouble started.

It really sucked. Working with Dragon in memoQ is usually a fairly painless process, but tonight the dogs were anxious and kept poking me in the ribs, and I never did get the microphone adjusted quite right. Some days, microphone position is everything to my scaly transcriptionist. So I suffered with a lot more editing than usual, as anyone watching the video above will see. I worked in my usual "mixed mode" manner, with both keyboard and voice control. Some colleagues who swear by DNS like to do everything by voice and would probably wipe their backsides in the WC that way as well if they could, but that's way too geeky for me. After watching my copywriting partner fly through some 10,000 words of legal translation - and edit it - in a short working day while I slogged through my 3,000 and finished long after she called it a day, I realized that I could work in the relaxed way she did with thoughtful stares at the screen, muttered bursts and the occasional keyboard touch.

But today was a bad day with the Dragon. I might have gone a bit faster with the text. After all, chickens aren't rocket science or even chemistry, with its tag-ridden notation. I could have just dictated in a word processor and everything would have one faster. And if I really want a TM or want to check the terminology, alignment is fast and also a good environment for editing my first draft. I know a number of translators who work that way now. Even with a dictaphone.

In his comments on the other post, Kevin Hendzel expressed a similar feeling to mine when translating with voice recognition: greater engagement and concentration on the text and its structure and meaning. But these tools are not without risk: any errors will in fact pass muster with a spelling checker, so proofreading workflows may have to be very different to be effective. I have noticed this myself - reading my text soon after I have translated it, I am very likely to overlook a missing or switched article or a homophone. Perhaps dictating into a word processor or - since I often look to the glossary hits and other hints on the right of my working window - exporting my text and re-aligning it in the CAT tool after an external rewrite may force my eyes to see things a little differently. In the two years that I have been making serious use of voice recognition I have not yet found the "perfect" workflow.

There are a lot of ways I can tease better results out of this work. But even on a bad day like today, things aren't all that awful. In fact, those familiar with some of the more honest estimates of output in optimized machine translation and post-editing scenarios will realize that today's lousy results (see the end of the video), maintained over the course of a working day, meet or beat the expectations for post-editing in a highly optimized scenario. Without the brain rot typically caused by PEMT! Now that's an advantage. Why don't we stop wasting time with machine translation and instead increase output by more research into the best ways of using voice recognition technology? Ah, but voice recognition is not yet optimized for every language! Ha ha ha... like MT is or ever will be. The millions that get flushed down the toilet with machine translation could and should buy a lot of improvement with voice recognition.

The real trouble with voice recognition is that you may not want your competition to use it. With or without CAT tools. Unlike machine translation.


  1. Very interesting and informative article.
    I usually bring up this argument when I discuss "productivity" with MT advocates.

    I too believe that the development and improvement of existing true productivity supporting technologies (i.e. those which enable one to focus on their core skill and work and less on the process of carrying it out) would be significantly more efficient in improving the quality workflow rather than trying to adapt the "standards" (if such even exist) to the limitations of a Machine Translation for the promotion of the narrow self-interests of the MT lobby.

  2. Thank you for your appeal to reconsider means for improving our productivity. I might well use this idea as inspiration for my Master's thesis and do some sort of comparison between MT post-editing and speech regognition supported translation. I also like your approach to dictate into a word editor and aligning the translation afterwards, combined with an editing cycle. I'd be interested in hearing about more workflow variations to optimize speech recognition in the translation environment.


Notice to spammers: your locations are being traced and fed to the recreational target list for my new line of chemical weapon drones :-)