An exploration of language technologies, translation education, practice and politics, ethical market strategies, workflow optimization, resource reviews, controversies, coffee and other topics of possible interest to the language services community and those who associate with it. Service hours: Thursdays, GMT 09:00 to 13:00.
Microphone quality makes a great difference in the quality of speech recognition results. And although the microphones integrated in iOS devices are generally good and give decent results, positioning the device in a way that is ergonomically viable for efficient dictated translation - and concurrent keyboard use - is not always so easy. This is a potential barrier to the effective use of Hey memoQ speech recognition.
So a good external microphone may be needed. But with recent iPhone models lacking a separate microphone jack and using the lightning port for both charging and microphone input, connecting that external microphone might not be as simple as one assumes. Especially not someone like me, who is rather ignorant of the different kinds of 3.5 mm audio connections. I have had a few failures so far trying to link my good headset to the iPhone 7.
Colleague Jim Wardell is not only the most experienced speech recognition expert for translation whom I am privileged to know; he is also a musician with extensive experience in microphones of all kinds and their connections. And recently he was kind enough to share the video below with me to clear up some misunderstandings about how to connect some good analog equipment to use with Hey memoQ on an iPhone 7 or later:
Today, memoQ Translation Technologies (the artists formerly known as "Kilgray") officially released their iOS dictation app along with memoQ version 8.7, making that popular translation environment tool the first on the desktop to offer free integrated speech recognition and control.
My initial tests of the release version are encouraging. Some bugs with capitalization which I identified with the beta test haven't been fixed yet, and some special characters which work fine in the iOS Notes app don't work at all, but on the whole it's a rather good start. The control commands implemented for memoQ work far better than I expected at this stage. I've got a very boring, clumsy (and unlisted) video of my initial function tests here if anyone cares to look.
Before long, I'll release a few command cheat sheets I've compiled for English (update: it's HERE), German and Portuguese, which show which iOS dictation functions are implemented so far in Hey memoQ and which don't perform as expected. There are no comprehensive lists of these commands, and even the ones that claim to cover everything have gaps and errors, which one can only sort out by trial and error. This isn't an issue with the memoQ development team for the most part, but rather of Apple's chaotic documentation.
The initial release only has a full set of commands implemented in English. Those who want to use control commands for navigating, selecting, inserting, etc. will have to enter there own localized commands for now, and this too involves some trial and error to come up with a good working set. And I hope that before long the development team will implement the language-specific command sets as a shareable light resources. That will make it much easier to get all the available languages sorted out properly for productive work.
I am very happy with what I see at the start. Here are a few highlights of the current state of Hey memoQ dictation:
Bilingual dictation, with source language dictation active when the cursor is on the source side and target language dictation active when the cursor is on the target side. Switching languages in my usual dictation tool - Dragon NaturallySpeaking - is a total pain in the butt.
No trainable vocabulary at present (an iOS API limitation), but this is balanced in a useful way by commands like "insert first" through "insert ninth", which enable direct insertion of the first nine items in the Translation Results pane. Thus is you maintain good termbases, the "no train" pain is minimized. And you can always work in "mixed mode" as I usually do, typing what is not convenient to speak and using keyboard shortcuts for commands not yet supported by voice control, like tag insertion.
Microphones connected (physically or via Bluetooth) with the iPhone or iPad work well if you don't want to use the integrated microphone in the iOS device. My Apple earphones worked great in a brief test.
Some users are a bit miffed that they can't work directly with microphones connected to the computer or with Android devices, but at the present time, the iOS dictation API is the best option for the development team to explore integrated speech functions which include program control. That won't work with Chrome speech recognition, for example. As other APIs improve, we can probably expect some new options for memoQ dictation.
Moreover, with the release of iOS 12, I think many older devices (which are cheap on eBay or probably free from friends who don't use them) are now viable tools for Hey memoQ dictation. Update: I found a list of iPhone and iPad devices compatible with iOS 12here.)
Just for fun, I tested whether Hey memoQ and Dragon NaturallySpeaking interfere with one another. They don't it seems. I switched back and forth from one to the other with no trouble. During the app's beta phase, I did not expect that I would take Hey memoQ as a serious alternative to DNS for English dictation, but with the current set of commands implemented, I can already work with greater comfort than expected, and I may in fact use this free tool quite a bit. And I think my friends working into Portuguese, Russian and other languages not supported by DNS will find Hey memoQ a better option than other dictation solutions I've seen so far.
This is just the beginning. But it's a damned good start really, and I expect very good things ahead from memoQ's development team. And I'm sure that, once again, SDL and others will follow the leader :-)
And last, but not least, here's an update to show how to connect the Hey memoQ app on your iOS device to memoQ 8.7+ on your computer to get started with dictation in translation:
When I first moved to Portugal I had a TomTom navigation system that I had used for a few years when I traveled. Upon crossing a border, I would usually change the language for audio cues, because listening to street names in one language pronounced badly in another was simply too confusing and possibly dangerous. Eventually, the navigation device died as crappy electronics inevitably do, and I changed over to smartphone navigation systems, first Apple Maps on my iPhone and, after I tired of getting sent down impossible goat trails in Minho, Google Maps, which generally did a better job of not getting me lost and into danger.
For the most part, the experience with Google Maps has been good. It's particularly nice for calling up restaurant information (hours, phone numbers, etc.) on the same display where I can initiate navigation to find the restaurant. The only problem was that using audio cues was painful, because the awful American woman's voice butchering Portuguese street names meant that my only hope of finding anything was to keep my eyes on the actual map and try to shut out (or simply turn off) the audio.
What I wanted was navigation instructions in Portuguese, at least while I am in Portugal; across the border in Spain it would be nice to have Spanish to avoid confusion. Not the spoken English voice of some clueless tourist from Oklahoma looking to find the nearest McDonald's and asking for prices in "real money". But although I found that I could at least dictate street names in a given language if I switched the input "keyboard" to that language, the app always spoke that awful, ignorant English.
And then it occurred to me: switch the entire interface language of the phone! Set your iPhone's language to German and Google Maps will pronounce German place names correctly. Same story for Portuguese, Spanish, etc. Presumably Hungarian too; I'll have to try that in Budapest next time. And that may have an additional benefit: fewer puzzled looks when someone asks where I'm staying and I can't even pronounce the street name.
It's a little disconcerting now to see all my notifications on the phone in Portuguese. But that's also useful, as the puzzle pieces of the language are mostly falling into place these days, and the only time I get completely confused now is if someone drops a Portuguese bomb into the middle of an English sentence when I'm not expecting it. Street names make sense now; I'm less distracted by the navigation voice when I drive.
And if some level of discomfort means that I use the damned smartphone less, that's a good thing too.
This blog post was produced by voice dictation on my iPhone 4S in a crowded restaurant with
a lot of background noise. This evening I came to my favorite hangout to work,
to get away from home for a while after a very long and stressful day.
I forgot my glasses when I left home, so I
cannot see the screen of my computer well enough to type accurately.
Essentially, I am working as if I were blind. I thought of driving home to
fetch my glasses and then returning here to work, but I did not want to take the
time. So, I thought that this would be the ideal opportunity
to test the dictation workflow which I have been showing to so many people in
quite a few languages in the last few weeks. Of course I am doing this in my
native language (English), but this would work just as well if I were a native
speaker of Arabic or Romanian or Portuguese, for example. What I am
experiencing so far in this test is that after speaking for a certain amount of
time, during which a text chunk of a certain size has been
generated, the application stops and communicates with the transcription server
from Nuance online, producing the transcribed text in the language which I am
speaking. However, that does not pose a great difficulty; I can simply restart
the recording and the text continues. If I want to, I can make corrections with
an on-screen keyboard on my mobile phone, but I prefer to email the text after
I am finished and make any changes or corrections on my computer. The last few
weeks have been very interesting. At the JABA Partner Summit in Porto,
Portugal, and later at the GALA conference in Seville, Spain, I tested this
workflow together with native speakers of many languages not supported by
Dragon NaturallySpeaking from Nuance. In every case the results seemed to be
excellent, but the texts generated during the tests were usually rather short,
no more than one or two paragraphs.
This is the longest text that I have created by this
process so far. I find that the "chunking" behavior of the
application is actually helpful. It allows me to look at groups of text that
are not too large (about enough to fill the screen of the iPhone) and make important corrections manually before I continue. On
the whole, this is in fact a rather comfortable process. With it, I can hang
out in the barn with my goats and chickens and a printout and translate
comfortably with a beer in one hand. Not bad. The ergonomic aspects are
excellent. I am dictating this text in English with a great deal of noise
coming from the nearby kitchen and the television which is less than 3 m from
me, blaring loudly in Portuguese.
I am very satisfied with the results of tonight's test.
And I hope that others will explore this workflow further, creating new
possibilities for better, more profitable work in many languages using this new
speech recognition capability. I think this is a game-changer.
This works on any Apple mobile device, such as the
iPhone, iPad or iPod. The app to download from the App Store is called
"Dragon Dictation". It is free. I discovered this particular
possibility after reading time and again the quality of speech recognition on
mobile devices is actually superior to what is available on desktop computers,
because that is where all of the research time and money is currently being
invested. It took me a while to realize the implications of this, but now I see that many can benefit a great deal from the possibilities that this makes
available. I look forward to reports of work in other languages. (The only
language that I have discovered to have significant restrictions so far is
Japanese, where apparently the Kanji recognition is not very good and Hiragana
characters are used too often, making a text difficult to read for a native
speaker. Steve Vitek tells me that the problem is that there are too many
homophones in Japanese, but that this should work well in another language such
as Mandarin Chinese. The initial tests with Mandarin Chinese in Seville, Spain
actually looked rather good.)
After dictation and transcription are complete, a few
button presses can send the text to an e-mail server for manual or automated
processing.
In recent years it seems that quite a few of my colleagues and customers have made smartphones part of their business. I'm not part of that crowd; years of being an early adapter of PDA technology (Palm devices, Sharp Wizards and a host of other long-forgotten gadgets) and other computer-related junk has made me a bit allergic to the technology, and the trauma of short-circuiting a 500 euro mobile phone by dropping it in a toilet and having another phone of the same class fall from my pocket as I ran to catch a train has made me a stubborn minimalist when it comes to phones and other electronics. Add to that the fact that I've found myself up to my hips in swamp water a few times in recent years while hunting boars, I tell myself that the last thing I need is an iPhone in my back pocket (where I once sat on and broke two Sharp Wizards).
And yet... after a friend recently showed me how to go offline with my phone the other day so I could avoid interrupted naps and sleeptalking with puzzled clients and friends, I began to think that maybe, just maybe, there could be a place in my routine for a little more gadgetry if indeed I can switch it off at those times when the 24/7 world does not interest me. Over the past year I have come to rely on the security and efficiency of the Online Translation Manager from LSP.net, which offers greater scope and scalability than any other business management tool for translation that I have seen and can fit in my budget and temperament. Now my #2 business tool (after my main squeeze, memoQ) has a rather functional interface for smartphones, and I'm very, very tempted. What the heck: I need a new digital camera, and my iPod died a horrible death a few years ago as I was backing up my translation archives onto it (how would you like to choke on a folder full of German patents?), so maybe I need to make a quick trip to the UK and get an unlocked iPhone. AFAIK the German market is still rather monopolistic. (Update: Perhaps that's not the case after all. FONIC sells the iPhone 3GS 8 MB without a contract for just under €500, which usually means it's not SIM- or net-locked. Combined with Skype for iPhone, the iPhone's WLAN capabilities and my Skype flatrates, this is starting to look very good. I've also been reminded that Apple's exclusive contract with a certain mobile service provider in Germany has expired, so the German Apple Stores sells devices with no lock. Nonetheless, I'll find an excuse to visit the UK. It's been too long.)
The iPhone screen shot here was made while the developers were showing me the new interface last week. It tells me as a project manager that there is a new quotation request (don't let the date fool you - this is on a test system where the wildest things happen), unread incoming on another project, and unread incoming e-mail and a delivery from a subcontractor on another. I can now assign tasks, respond to the mails and carry out other necessary project management activities. If this interface had been available in recent months (and I had a device to use it), a number of urgent requests from cherished clients during the busy holiday season wouldn't have gone to the dogs. (Other dogs, not mine.) For reasons I have not yet fathomed, a number of clients who used to pick up the phone to communicate an urgent request now assume that I'll be sitting in front of my screen to respond ASAP. Come to think of it, most of these send me e-mail where the footer indicates that the message was sent from a CrackBerry. OK, OK - I surrender.
Seriously, though: this is an extremely useful feature that I hope to see expanded soon to include the customer and supplier ("resource") interfaces. It makes a good business management environment even more useful and relevant.
Release update (compatibility): This feature was developed with and tested on Apple iPhones for the most part. A Blackberry interface was developed as well using a simulator from RIM, due to the wide variety of devices and the unavailability of many for testing. It seems that the display does in fact function differently between different models, so the optimized smartphone interface may not be ready for many Blackberry models; these will have to rely on browser access that is unoptimized. Other smartphones may be able to use this feature by changing the value of the user agent (in Firefox there is a plug-in for this), which usually involves some seriously nerdy settings tweaks in the browser. In summary: iPhone? Not a problem. Anything else? Maybe, but expect to work at it. I'm betting that there will eventually be a generic small device interface parallel to an optimized one for popular devices such as the iPhone. It would make sense. Even with one, however, I'll eventually join The Cult again and visit the Apple Store.