Apr 15, 2014

Speech recognition for translators: microphone tips

Guest post by Jim Wardell

Mark Myworts is heavy into a major procrastination project, pawing through moldy old copies of Popular Mechanics in Grandpa’s basement, when a small classified ad in the back pages catches his eye. He blows off the dust:
“Translators! Double your income overnight with amazing new technology! Works wonders for English, German, Spanish, French, Italian, Dutch. No obligation. Call ... Full confidentiality guaranteed.”
[Fade in “Twilight Zone” theme music.]

[Cut to Mark talking intently to his computer.] “... should be used instead of the diminutive terms ‘pud’ and ‘loser’” ...

[Fade to Mark and kiddies.] “Well daddy, can we? can we? Can we go to the circus tonight?” “Sure kids, I’m knocking off early today,” says Mark nonchalantly, getting a kiss and one of those sexy “well-what-about-after-the circus” looks from his admiring wife.

Science fiction? 1950s social mythology? Perhaps.

But the simple fact remains – at least now in 2014 – that translators into English, German, Spanish, French, Italian and Dutch really can double their productivity on average using some amazing, although not quite so new technology: speech recognition.


I’ve been using speech recognition to translate from German into English for nearly 20 years. But it was not until about seven or eight years ago that computing power and speech recognition software had improved to the point where serious productivity gains became possible. It was at that point that it became imperative for me to find a CAT tool that was totally compatible with Dragon NaturallySpeaking. At the time, only two products met this requirement: Déjà Vu and memoQ. For various reasons, which I won’t get into now, I decided to go with memoQ, a decision I have never once regretted.

Most any CAT tool can be made to work with Dragon by using the little “Dictation Box” text buffer that’s provided as a workaround in Dragon for software that is not truly Dragon compliant. The procedure needed to translate average moderately sophisticated technical documents in noncompliant CAT tools can often be cumbersome and inefficient: one copies the contents of the current source segment text into the Dictation Box so any strings that do not need to be translated can be left as is or moved around as desired and so that sections of text that need to be translated can be overwritten by marking them and then dictating the new translation “over the top”. Once the source segment has been duly massaged in the Dictation Box, the contents of the box are then transferred to the target box in the noncompliant CAT tool. Of course, various tags and formatting that might have been present in the source segment are often lost when source text pasted into the Dictation Box. So they need to be put back in again after the contents of the Dictation Box have been pasted into the CAT target box. If this sounds gruesome, it is.

So why not just dictate straight into the noncompliant target box and fix the messes as they occur? The answer is simple: there are often too many messes, and, worse still, any incorrect speech recognition that occurs cannot be corrected in a way that will ensure that correct and not erroneous data will be fed and saved in the Dragon speech recognition engine. Over time, this would degrade speech recognition accuracy! I spent some years trying to address this issue with publishers of CAT tools other than memoQ and Déjà Vu ... with zero success. So if you’re not already using Dragon and want to use it with a CAT tool, make sure that the CAT tool that you are thinking of using is really fully compatible with Dragon and that you can get your money back if it’s not. Do not trust and do verify.[1]

At one point before I switched memoQ, I was compelled to do a good bit of this acrobatics moving text into an out of the Dragon Dictation Box. I began to have the feeling that the cutting edge that I was working on in a number of well-known CAT tools was so dull that I might just as well have been typing in my translation in the old-fashioned way. So I collected some statistics discovered that that was indeed the case. My output was the same in noncompliant CAT tools and Dragon as with touch typing without Dragon.

All that changed with the memoQ’s full Dragon compatibility! Incidentally, memoQ has full Dragon compatibility throughout the interface and not just in the translation grid. So if you want to dictate notes or definitions in term base entries, you can, and you can still use all of the selection and correction features you are accustomed to using in Dragon. Want to write a longish note to a client in a memoQ Comment box? No problem. Dictate away.

* * *

Anyone who has dealt with integrated technologies, any process in fact, knows that the old saying “A chain is only as strong as its weakest link” is totally true. So to get really great speech recognition results, not only does one’s CAT tool have to be compatible and outstandingly good, one’s computer needs to be sufficiently powerful, and one needs to use the best available microphones. I heartily recommend KnowBrainer.com as a source of top quality microphones for speech recognition. To my knowledge, KnowBrainer is the expert in the USA, probably the world, when it comes to speech recognition products. People who want to achieve maximum accuracy with speech recognition software should make their first stop KnowBrainer’s Microphone Comparison page.[2] For many years, I used KnowBrainer’s top-rated Samson Airline 77 microphone. This microphone was vastly superior to anything I had ever used in the past and came delightfully close to delivering 100% speech recognition accuracy. Earlier this year, however, I learned that the wireless channel used in my old Airline 77, which I bought while I was still located in the United States, was being shifted to use by mobile phones in Europe and would no longer be legal. So I checked out KnowBrainer again, and learned about a relatively new microphone being produced specifically for speech recognition by a Belgian company: SpeechWare. Upon consulting with KnowBrainer’s Lunis Orcutt (Mr. Speech Recognition in my book!), I ordered the SpeechWare 3-1 TableMike. This desktop mic is a great product and just as good as my old Airline 77. It’s the mic that Lunis himself uses.

However, after using it for a week or so, I realized that it was not for me because I had to keep my mouth relatively close to the microphone and couldn’t move around like I was used to in the past in order to relax my back muscles and stay fresh. I then ordered a FlexyMike headset mic from SpeechWare that basically uses the same technology but allows one to move around freely. SpeechWare has three models of the FlexyMike: the FlexyMike Basic (FMK01), the Single Ear (SE) and the Dual Ear. I chose the Dual Ear on the principle that distributing the weight of the mic over two ears would be more comfortable and stable for hours and hours of continuous use.

When I was still using the tabletop TableMike, I found that I had a tendency to move a little too far away from the microphone over time, which occasionally reduced speech recognition performance. The TableMike has two settings: a long-range setting, which allows one to have one’s mouth as far as 30 cm (12 inches) from the microphone, and a “normal and VoIP” setting (with a maximum distance of 15 cm / 6 inches). The greatest accuracy is achieved with the closer distance. Lunis says he likes the TableMike because he moves around in the office a lot and doesn’t need to fumble around with headset whenever he leaves his desk. For this reason, I would recommend the TableMike as the best choice for project managers and administrators who may frequently have to leave their desks and who mainly use Dragon at brief stretches to dictate e-mail messages or enter data in translation business management software. For hard-core translation work, the FlexyMike is the way to go. I find the accuracy with the FlexyMike to be perhaps a tad better than that of the Airline 77, which is saying a lot. I can use the FlexyMike while listening to the radio a moderate volume levels, so the noise cancellation is also quite good. KnowBrainer gives its noise cancellation a score of 9, which is better than that of the Sennheiser ME3 KB headset mic (gets an 8), which I have used with good success for years in automobiles, trains and airplanes! All the same, if anyone has to dictate in an extremely noisy environment, one might want to check out “theBoom v4 KB”, which gets a high accuracy rating and a 10 for noise cancellation (but only a 9 for comfort!) from KnowBrainer. My experience is that KnowBrainer is pretty fanatical about these evaluations and that they are quite reliable.

For the average translator, who works long hours in a relatively quiet environment, accuracy and comfort are the two most important factors, more important than noise cancellation. I don’t need speakers on my headset, which means that the headset can be as light as a feather and can be worn comfortably all day long. If need be, Skype calls or music can be played through normal computer speakers. On the other hand, if one is working in an open office setting with a number of other translators close by, one might want to have a headset with speakers covering both ears to block out distracting voices so one can concentrate better. In such cases, I’d consider the mono Umevoice “theBoom Pro-2 KB” or the stereo hi-fi equivalent “... 3 KB” if you want to block out room noise and also want to listen to music while you translate. (I translate very complicated, detailed stuff and usually extremely distracting to listen to music while translating, but not all material that gets translated requires extreme concentration. I could also easily imagine listening to a high-bandwidth feed from, say, jazzradio.com premium (unabashed plug) to make routine administrative work more pleasant.

Getting back to the FlexyMikes: SpeechWare was kind enough to also send me a single-ear model to test and evaluate, so I have used both versions extensively. Both the single-ear and the double-ear mics are extremely comfortable and both are very easy to adjust to get a custom fit that is secure and comfortable. The materials used in both mics are of exceptionally high quality and should provide many years of reliable service.

Both FlexyMikes connect to a computer USB port across a “SpeechMatic MultiAdapter”, which has been especially configured for high accuracy with speech recognition. I am convinced that the special design of the MultiAdapter is one of the main reasons why the FlexyMikes work so well.[3] Be sure to buy this along with your FlexyMike. The same circuitry that’s in the MultiAdapter is integrated into the TableMike units. So if you already have a TableMike, you don’t need to buy a MultiAdapter, unless of course you want something really small and light to use with a notebook computer when traveling.

I did not test the basic version of the FlexyMike, from the pictures it didn’t look as comfortable as the other models.

KnowBrainer.com ships internationally. SpeechWare microphones are also available directly from SpeechWare in Europe.

[1] If you want to see what “fully compatible” means, have a look at http://kilgray.com/news/once-upon-time-there-was-dragon.
[2] http://www.knowbrainer.com/core/pages/miccompare.cfm
[3] So is KnowBrainer: See http://www.knowbrainer.com/NewStore/pc/viewPrd.asp?idproduct=464


Jim Wardell will be presenting optimized work methods for speech recognition once again at this year's memoQfest in Budapest, Hungary.


  1. Hi Jim/Kevin,

    Believe it or not, but I am getting amazing results with a basic Logitech USB Desktop Microphone (it’s this one: http://www.amazon.co.uk/gp/product/B00009EHJV/ref=oh_o00_s00_i00_details). I don’t really use it to translate much as I am still getting used to the whole using-Dragon-in-a-CAT tool thing, but my wife also uses the same mic. She is a copy writer and dictates into hers all day long, with near perfect results. Neither of us have done any rigorous testing, but they work very, very well, and this with a mic that you can leave on your desktop (I hate having anything on my head, except for my glasses), as long as it is located right in front of you. I have mine directly under my screen.

    It’s no longer being made, but I just ordered another one last week on eBay, where they can still be found. I have no doubt that the ones you mentioned are better, but I think I only paid £30 for this one, and given that it seems to work this well I don't feel the need to try a more expensive one.

    See also: http://www.proz.com/forum/speech_recognition/219118-dragon_naturally_speaking_with_desktop_mic.html


    1. An Israeli colleague with whom Jim and I correspond on this subject occasionally explained the different experiences people have with microphones based on the characteristics of their voice. He spoke about the importance of this in radio broadcasting as well. I'm not a musician like Jim or a radio expert like Moshe, so I can't say that I understood all the technical aspects of the discussion, but their explanations fit pretty well with the voices of colleagues who have had reasonable results with inexpensive equipment and those who find they must invest something on the order of €300 for best results. Everyone has found that the garbage equipment in the Dragon box set is easily surpassed, but after that you'll find many opinions. In Jim's case, though, you've got someone who, through very long use, has come to consider other factors that are of great importance along with accuracy, particularly comfort for long use. I have an excellent headset myself, but after several hours of use my ears have had enough. With Jim's equipment I wouldn't have that problem. However, your suggestion of the headset as a desktop mic is worth a try for now - it picks up the barking dogs next door, so no reason why it wouldn't get my voice 20 cm away. But after my latest move to a neighborhood where nearly everyone has undisciplined, yappy mutts, I think I'll have to consider what Jim says about noise cancellation.

  2. Jim, when you discuss CAT tool "compatibility" here and in other conversations we've had, what are your criteria for judgment? I know that colleagues use speech recognition in a variety of environments, but my own experience has shown me that results may differ considerably. For example, I tried to use Fluency's transcription module to dictate some old Civil War letters, but the results were very unsatisfying, particularly because the capitalization was often not correct at the start of sentences. It seemed like DNS would lose its way somehow partway through the work. But I've never formulated reasonable comparison criteria, for example for a like vs. like comparison of SDL Trados Studio and memoQ for dictation performance. Any suggestions for that?


Notice to spammers: your locations are being traced and fed to the recreational target list for my new line of chemical weapon drones :-)