Dec 7, 2018

Integrated iOS speech recognition in memoQ 8.7

Today, memoQ Translation Technologies (the artists formerly known as "Kilgray") officially released their iOS dictation app along with memoQ version 8.7, making that popular translation environment tool the first on the desktop to offer free integrated speech recognition and control.


My initial tests of the release version are encouraging. Some bugs with capitalization which I identified with the beta test haven't been fixed yet, and some special characters which work fine in the iOS Notes app don't work at all, but on the whole it's a rather good start. The control commands implemented for memoQ work far better than I expected at this stage. I've got a very boring, clumsy (and unlisted) video of my initial function tests here if anyone cares to look.

Before long, I'll release a few command cheat sheets I've compiled for English (update: it's HERE), German and Portuguese, which show which iOS dictation functions are implemented so far in Hey memoQ and which don't perform as expected. There are no comprehensive lists of these commands, and even the ones that claim to cover everything have gaps and errors, which one can only sort out by trial and error. This isn't an issue with the memoQ development team for the most part, but rather of Apple's chaotic documentation.

The initial release only has a full set of commands implemented in English. Those who want to use control commands for navigating, selecting, inserting, etc. will have to enter there own localized commands for now, and this too involves some trial and error to come up with a good working set. And I hope that before long the development team will implement the language-specific command sets as a shareable light resources. That will make it much easier to get all the available languages sorted out properly for productive work.

I am very happy with what I see at the start. Here are a few highlights of the current state of Hey memoQ dictation:
  • Bilingual dictation, with source language dictation active when the cursor is on the source side and target language dictation active when the cursor is on the target side. Switching languages in my usual dictation tool - Dragon NaturallySpeaking - is a total pain in the butt.
  • No trainable vocabulary at present (an iOS API limitation), but this is balanced in a useful way by commands like "insert first" through "insert ninth", which enable direct insertion of the first nine items in the Translation Results pane. Thus is you maintain good termbases, the "no train" pain is minimized. And you can always work in "mixed mode" as I usually do, typing what is not convenient to speak and using keyboard shortcuts for commands not yet supported by voice control, like tag insertion.
  • Microphones connected (physically or via Bluetooth) with the iPhone or iPad work well if you don't want to use the integrated microphone in the iOS device. My Apple earphones worked great in a brief test.
Some users are a bit miffed that they can't work directly with microphones connected to the computer or with Android devices, but at the present time, the iOS dictation API is the best option for the development team to explore integrated speech functions which include program control. That won't work with Chrome speech recognition, for example. As other APIs improve, we can probably expect some new options for memoQ dictation.

Moreover, with the release of iOS 12, I think many older devices (which are cheap on eBay or probably free from friends who don't use them) are now viable tools for Hey memoQ dictation. Update: I found a list of iPhone and iPad devices compatible with iOS 12 here.)

Just for fun, I tested whether Hey memoQ and Dragon NaturallySpeaking interfere with one another. They don't it seems. I switched back and forth from one to the other with no trouble. During the app's beta phase, I did not expect that I would take Hey memoQ as a serious alternative to DNS for English dictation, but with the current set of commands implemented, I can already work with greater comfort than expected, and I may in fact use this free tool quite a bit. And I think my friends working into Portuguese, Russian and other languages not supported by DNS will find Hey memoQ a better option than other dictation solutions I've seen so far.

This is just the beginning. But it's a damned good start really, and I expect very good things ahead from memoQ's development team. And I'm sure that, once again, SDL and others will follow the leader :-)

And last, but not least, here's an update to show how to connect the Hey memoQ app on your iOS device to memoQ 8.7+ on your computer to get started with dictation in translation:


16 comments:

  1. One small remark/correction: iOS 12 only supports iPhone 5s and newer models, so you won't be able to install it on a 4s.

    ReplyDelete
  2. Thanks for the feeback, Kevin!

    Your iPhne 4S won't work. iPhone 5 is the minimum, because you must be able to install iOS version 10 at least. A used iPhone 5 costs around EUR 50 around here.

    ReplyDelete
    Replies
    1. Thanks, Gergely, someone pointed that out to me the other day on FB, but I didn't get around to updating the information on the post yet. €50 isn't bad; I usually spend that much or more for a microphone. I think some people have gotten so emotionally invested in their Android phones or other technology that it's hard for them to think of the (iOS) input as simply a microphone. At from what I have been able to determine so far in my research on other speech APIs and their ability to pass commands to a Windows application, it doesn't look like there are any good options right now which would have the language coverage available for iOS speech recognition. Cortana certainly wouldn't (that does not even offer European Portuguese right now) and as far as I can tell, Chrome won't handle that either, so users who need Chrome languages (like Slovene) might as well just use that Chrome app I wrote about recently and input commands manually.

      Delete
  3. If you do same-language memoQ projects like I occasionally do (generic English to EN-US for revision, for example), there is a problem with the selection commands in the first release version of "Hey memoQ" at least. See the video record at https://youtu.be/GFGGE_5RQTU

    ReplyDelete
    Replies
    1. I couldn't reproduce this, there might be something else going on. Maybe it is some leftover from the pre-release version you tested, etc.

      Delete
    2. I think I accidentally deleted your other reply, sorry. You posted in duplicate and I was trying to clean up and something went wrong. In any case, the problem is a little different than I explained it; I'll install on my office machine over the weekend just to be sure and then send you a little video via the support ticket. Don't let my grumbling over some of these details distract you. I think this is a great new feature with a lot of potential for a great number of colleagues, including my friends in Portuguese- and Arabic-speaking countries. Thank you so much for responding to the needs of so many translators of important "minor" languages that Nuance and others continue to leave unserved.

      Delete
  4. There is also a bug with the "reset filter" command for the target text. If there IS no matching target text, you can apply the filter just fine, getting an empty result, but the filter cannot be reset by voice then. It has to be done manually. I made a little video record of this (https://youtu.be/XMkO3sD4Tc4), which also shows how to make source text filtering work in the release as of 10 December 2018. The German command used to reset the filter ("Lösche Filter") was configured by me, because there currently are no German settings in the shipping software. What's not shown in the video record is how I filtered the target (with content) and was able to reset, but then I filtered on the missing word "pirates" and was once again not able to reset.

    ReplyDelete
    Replies
    1. Fixed for next build. We spotted this ourselves in QA too, the fix just didn't make it into the first release.

      Delete
  5. Congratulations and a big THANK YOU to everyone at memoQ who have always listened to translators through the years! Now, finally, translators into many of the non-Dragon-NaturallySpeaking languages have an effective way to reap the productivity, quality and ergonomic advantages of speech recognition while using the world's leading translation memory tool. I tested Hey memoQ today and came to the same conclusions as Kevin. This is an awesome step forward in translation technology. Translators into Portuguese, Russian, Arabic, Hebrew, Greek, Chinese, Japanese, Turkish, Hindi, Indonesian, Korean, Indonesian, Thai, and many others (https://www.memoq.com/en/news/hey-memoq-frequently-asked-questions), and, yes, Hungarian should be dancing in the streets. If you consider the number of translators working into these languages, it's going to be some big party. And, hey, thanks Kevin for your enthusiasm and dedication!
    Jim

    ReplyDelete
    Replies
    1. Ha! I was going to write and ask what you thought of all this given that you have longer experience with voice transcription technology than anyone I know. Things sure have come a long way since we saw each other at memoQfest 2015 and you were showing a couple of solutions like this - I think one was an early Chrome speech recognition in memoQ Web. I'm hoping that you and Moshe will have some good ideas on optimizing the ergonomics for Hey memoQ. I usually do OK with it on a small tripod on my desktop -I think the Apple mikes are fairly good quality, but as they get farther away from the mouth there can be some issues with room acoustics. I think ultimately we'll be looking at some sort of external mike running through the iOS device, but I've just started to look at these options and I'm not terribly happy with the cheap solutions. Maybe a boom to keep the phone close to my mouth will work best.

      Delete
  6. For those using an iOS device that has only those damned "lightning" plugs for both charging and sound input from an external microphone, there are splitters available. Searching Amazon with the keywords "iPhone" and "splitter" brings up a number of interesting options, including quite a few that include 3.5 mm jacks and one with a standard USB jack, a 3.5 mm jack and a lightning jack (i.e. female 3 ports) in the split. These might offer some good options to use high quality external microphone equipment. Here's a short URL for one such search: https://goo.gl/AiUc4S

    ReplyDelete
    Replies
    1. c. Wireless or cables? Not having to worry about cords and cables is wonderful. I used an Samson Airline wireless system for many years, and really enjoyed not having the minor but nagging annoyance of always having wires hang around my head. On the other hand, my current mic, a FlexyMike single-ear model (SE) is so lightweight and the its cable is so light and flexible, that I barely know I'm wearing it. One intriguing solution that enters my mind is the Apple Airpod, which have directional mics in each ear that supposedly can "focus" on the sound coming from your mouth and cancel out extraneous noise. They cost somewhat less than a high-quality microphone system.
      Speaking of cost, a first step in the direction of an external mic might be to get an Apple Lightning-to-Earphone adapter (about 10 euros, and which apparently also can be used for 3.5 mm micro TRRS microphone jacks), then use the typical ear pods and built-in mic that come with iPhones and other smartphones. This might just be "good enough". On the other hand, Apple's speech recognition is good, but not nearly as accurate as Dragon NaturallySpeaking. So in the case of Hey memoQ it would seem to make good business sense to use a microphone system that gives Apple's speech recognition the best chance to achieve its maximum potential for accuracy.
      One needs to be careful about microphone connectors. There are at least 3 flavours of 3.5 mm jacks: mono (1 black band), stereo TRS (2 black bands), and stereo TRRS (you guessed it, 3 black bands). Simply connecting any wee plug into your Apple Ligthning-to-Earphone connector may not work. There may also be impedance issues.
      So it seems to me that the easiest wired mic solution is to go with the standard approach, which is to say one having a final USB output. To do this, you'll need to have an Apple Lightning-to-USB Micro Adapter ($19) and probably an USB standard-to-micro adapter to connect your mic system output to your Apple Lightning-to-USB Micro Adapter. One needs to beware of splitters that look like they offer both headphone and USB ports. The headphone side might just be output-only (i.e. headphones only) and the USB port is probably just 5 watt charging voltage. Apple's proprietary headphone adapter seems to handle both input and output (Apple does not make a Lighting-to-mic adapter, so that must be the case). Cheers, Jim.

      Delete
  7. Hi Kevin,
    1. The first stop for anyone exploring speech recognition microphones should always be knowbrainer.com.
    2. Before exploring the microphone selection guide at knowbrainer, a translator looking for an external microphone for speech recognition should consider:
    a. Do you need noise cancellation?(noisy roommates, kids, pets, neighborhood, traffic, fans/AC, possible use in cars/trains/planes, like to listen to music while working)
    b. The best speech recognition accuracy is always obtained when the distance between the mic and your mouth is short and constant. This argues for wearing a boom mic or a lavalier mic. Typical boom mic headsets come with earphones. Most people find it annoying and uncomfortable to wear earphones for more than an hour at a time. On the other hand, if you are working in a noisy open office environment, having earphones covering your ears and blocking out distracting conversations and room noise could be just the ticket. Or if you need to take lots of phone calls and can route the calls through your earphones. I have always used boom mics without earpieces. Good ones are light, comfortable and can be worn for hours and hours. A lavalier mic is even more comfortable, but one must find a way to fix the mic's position on one's clothing so that there is no rubbing between mic and clothing and so that the mic cable also is immobilized so that it does not physically conduct acoustic noise into the mic. Another option is a mic mounted on a stand on the one's desk. The ultimate in comfort, but the disadvantage is that one must always remember to maintain a rather constant position relative to the mic. This can lead to back/shoulder/neck strain.

    ReplyDelete
  8. Well, ah-hem, I'm as long-winded as ever. So I had to split a long post into two, which appear above in reverse order. Knowbrainer's ratings guide is hard to find on the site, so here's a link: http://www.knowbrainer.com/core/pages/miccompare.cfm. The Knowbrainer ratings for all three microphones that I've used at length are very accurate. All four factors matter, but which matters most will depend on the user's specific situation. Notice that the Knowbrainer folks are very skeptical about Bluetooth microphones. On the other hand the only one they recommend, a Sennheiser, looks like it MIGHT work especially well with iPhones. Kevin, sounds like the next step for you might be the €10 Lightning/Earphone adapter with Apple ear pods plugged in. The price is right. :-)

    ReplyDelete
  9. Ok, so we have just come across this app, and before I am using it with any of my clients, I have a question regarding data protection. Since it is using Apple's iOS technology, where is the dictated data stored/processed? Can I sleep easy knowing my clients' sensitive information isn't where it shouldn't be?

    ReplyDelete
    Replies
    1. Apple's technology for speech recognition is based on Nuance's mobile APIs. That issue was discussed in detail during Q&A for Jim Wardell's talk at memoQ Fest 2015, which is available online. There were two Nuance reps there who shared details regarding how these remote servers work and how the use of this technology passed the strictest data security tests of the US government in the days before the head of that government made turning over anything and everything to foreign powers a matter of public policy.

      But... Nuance's way of handling speech data for Dragon Dictate, etc. has nothing to do with what Apple does, regardless of any Nuance roots their speech technology might have. There is an old article (https://www.zdnet.com/article/apple-stores-your-voice-data-for-two-years/) which raises some concerns, and I haven't looked to see if there are any updates on that, because I use local recognition solutions for my work in German and English, so none of this is directly relevant to me or my clientele.

      Delete

Notice to spammers: your locations are being traced and fed to the recreational target list for my new line of chemical weapon drones :-)