An exploration of language technologies, translation education, practice and politics, ethical market strategies, workflow optimization, resource reviews, controversies, coffee and other topics of possible interest to the language services community and those who associate with it. Service hours: Thursdays, GMT 09:00 to 13:00.
Showing posts with label speech recognition. Show all posts
Showing posts with label speech recognition. Show all posts
Feb 6, 2020
Speech-to-text in language services and learning: an update (rescheduled)
This presentation has been rescheduled due to unanticipated conflicts. On March 4th at 4:00 pm Central European Time (10 am Eastern Standard Time), I'll be presenting an overview of some popular and/or possible platforms for generating text from spoken words for professional work and language learning. As those who have followed this blog for years know, I have written quite a bit about this in the past and done a number of videos for demonstration and instruction using various platforms, but this is a field subject to frequent change and many new developments, so it is difficult sometimes to understand the value of one tool versus another for different applications.
The webinar is available free to anyone interested, and there will be time for questions afterward. We will compare and contrast Dragon NaturallySpeaking, iOS-based applications (including Hey memoQ), Google Chrome and Windows 10 for speech recognition work in translation and transcription, discussing some of the advantages and trade-offs with each platform working in translation environments and text-editing software, and the range of languages covered by each. Join us, and see if there are good fits for your speech recognition needs!
You can register here for the discussion.
Jan 28, 2020
Another look at Windows 10 speech recognition
A few years ago while on "holiday", I returned from dinner to find that my laptop had bluescreened. Panic time! It was Saturday night, and I still had quite a lot of text to translate and deliver on Monday morning. And up on the highest mountain in Portugal, I wasn't sure where I could find a replacement to finish the project, which was, at least, not utterly lost, because I had put it on a memoQ Cloud server for testing. The next day I got lucky: about 50 km away there was a Worten, where I picked up a gamer laptop with lots of RAM and an SSD. Well, not so lucky, as it was a Hewlett Packard Omen, with a fan prone to failure, but that's another story....
This new laptop was my first encounter with Windows 10. I had heard that this operating system offered improved speech recognition capabilities, and since I prefer to dictate my translations and downloading the 3 GB installation file for Dragon NaturallySpeaking (DNS) from my server at the office was going to take forever, I thought I would give Windows 10 speech recognition a try. I hadn't installed my CAT tool of choice yet, so I fired up Microsoft Word and began dictating. "Not bad," I thought. Then I tried it in my translation environment, and the results were a complete disaster. So I put that mess out of my mind.
Since then there have been some notable advances in speech-to-text capabilities on a number of platforms. But the best solution for my languages (German and English) with DNS became increasingly cranky thanks to neglect of the product by Nuance. Every week I read new reports of trouble with DNS in a variety of environments in which it used to perform very well. Apple's iOS 13 was a great leap forward of sorts for speech recognition and voice-controlled editing, but the new features are only available in English, and having Voice Control activated totally screws up my otherwise rather good dictation in German and Portuguese (or any other language). And don't get me started on the crappy vocabulary addition feature, which uses text entry alone with no link to actual pronunciation. Good luck with that garbage. It's not a bad solution in Hey memoQ with the additional command features added, but iOS dictation is not completely up to reasonable professional standards yet.
I probably would have given no further thought to Windows 10's speech-to-text features if it weren't for Anthony Rudd. We've corresponded a bit since I bought his excellent book on regular expressions for translators (and there's another practical guide for us coming soon from him!), and in a recent discussion he alluded to the use of Unicode with regex as a simple way of dealing with some things another colleague was struggling with. I was intrigued by this, and so for about half a day, I ran down a rabbit hole, testing Unicode subscripts and superscripts for a variety of purposes like fixing bad OCR of footnote markers and empirical formulae, autocorrecting common expressions for subscripted variables and chemical terms, including subscripts and superscripts in term bases and much more. Fascinating and useful stuff on the whole, even if some fonts don't support it well.
And of course I looked at using these special Unicode characters in speech-to-text applications. DNS had some funky quirks (not allowing numbers in the "spoken" version of terms, for example), but it worked rather well, so I can now say "calcium nitrate formula" and get Ca(NO₃)₂ without much ado. And for some reason it occurred to me to give Windows 10 speech recognition a try, just because I was curious whether vocabulary could in fact be trained. Indeed it can, and that feature is better than iOS 13 or DNS by far.
But first I had to remember how to activate speech recognition for Windows on my laptop again. When in doubt, type what you're looking for in the search box....
![]() |
Notice I've pinned Windows Speech Recognition to my taskbar on the right, which is good for quick tasks. |
Gesucht, gefunden. Unlike other speech recognition solutions, the one in Windows 10 works only for the language set for the operating system. And options there are limited to English (United States, United Kingdom, Canada, India, and Australia), French, German, Japanese, Mandarin (Chinese Simplified and Chinese Traditional) and Spanish.
I put on my trusty Plantronics earset (the best microphone I've used for dictation tasks or audio in my occasional webinars in the past year) and began to dictate, first in Microsoft Word, which had shown acceptable results in my tests long ago. I found that adding vocabulary in the Speech Dictionary (accessed via the context menu in the dictation control element shown as a graphic at the top of this post) was dead simple.
The option to record pronunciation enabled me to record non-English names and words in several languages. And sure enough, the Unicode subscripts and superscripts worked, so I can now say CO₂ (I just dictated that) to my heart's content.
I was expecting a mess when I tried to use Windows 10 speech-to-text in a CAT tool, but it was not to be. It was brilliant, actually. I tried it in my copy of SDL Trados Studio, and with the scratchpad disabled so I could dictate directly into the target it worked well. No voice-controlled editing like I'm used to with DNS in memoQ, but that DNS feature does not work in SDL Trados Studio anyway, so this is no worse. But with the scratchpad box enabled (see the screenshot below), I could use voice commands to select and correct text or perform other operations. Brilliant!
![]() |
After clicking or speaking "Insert", the text will be written to the target field with the proper formatting |
So users of SDL Trados Studio who translate to a target language supported by Windows 10 speech recognition are probably better off not giving their money to Nuance, which I'm told can't even be bothered to make a 64-bit version of DNS now (which probably accounts for a lot of the trouble people have with that program.
I tested Wordfast Pro 5, which seems to confuse the speech recognition tool horribly, with source text displayed in the floating bar for some odd reason. But my earlier tests of Wordfast with DNS were equally unhappy, so somehow I'm not surprised. And I didn't test the Memsource desktop editor, which took the price a few years ago for the worst-ever DNS dictation results with a CAT tool. I'll leave that to someone with a much wider masochistic streak.
But what about memoQ, my personal environment of choice for most translation work? Equally brilliant, works just the same as SDL Trados Studio. No voice control for editing without the dictation scratchpad enabled (there, DNS has an advantage in memoQ), but with the scratchpad you can use the voice commands to edit before inserting in the target text field.
Wanna see this in action? Have a look at this short demo video:
I hope that the future will bring us more language support for Windows 10 dictation (Portuguese, Russian and Arabic, please!) and that other providers (like Google, if you're listening, and Apple, which never listens to anyone anymore except to spy on them with Siri) will expand the speech-to-text features offered, particularly to include sound-linked vocabulary training and better adaptation to individual users' speech. Five years ago when I began to investigate alternatives for non-DNS languages, I expected we would have more by now, and we do, but professional needs require all providers to raise their game.
Addendum: Someone asked me if Windows Speech Recognition is a cloud resource or a locally installed one which will work without an Internet connection. It's definitely the latter. So if you have lousy bandwidth or find yourself disconnected from the Internet, you can still use speech-to-text features.
And more: I use a lot of spoken commands for keyboard shortcuts when I work, so I did a little research and testing. It seems that Windows 10 speech recognition gives full access to an application's keyboard shortcuts via voice. So in memoQ, for example, I can dictate the insertion of tags, items from the Translation Results pane and a lot more. Watch out, Nuance. Windows 10 is going to kick your Dragon's scaly butt!
Jul 11, 2019
iOS 13: interesting options for dictators
Given the deteriorating political situation of many countries in the world today, the title of this post may seem ominous to some; however, the actual situation for those who use Apple's iOS operating system seems to call for some optimism in the months ahead. Among all the myriad feature changes in the upcoming Apple iOS 13 (now in the Public Beta 2 phase), there are a few which may be of particular value to writers and translators who dictate using their iOS devices.
Attention Awareness
This is 2019, and not only is Big Brother watching you, but your iPhone will as well. The rear-facing camera on some models will detect when you look away from the phone – perhaps to tell your dog to get off the couch – and switch off voice control. The scope of application for this feature isn't clear yet, and I have my doubts whether this would be relevant to more ergonomic ways of working with applications like Hey memoQ (which involve Bluetooth headsets or earsets to avoid directionality problems as the head may turn to examine references, etc.), but for some modes of text dictation work, this could prove useful. I have lost track of how often I've been interrupted by people and found my responses transcribed in one way or another, often as an amusing salad of errors when I switch languages.
Automatic language selection in Dictation
The iOS 13 features preview from Apple states, "Dictation automatically detects which language a user is speaking. The language will be chosen from the keyboard languages enabled on the device, up to a maximum of four." Well, well. I wonder how it will handle isolated sentences or paragraphs quoted in another language – or individual foreign words. I'm betting probably not. But I'll have great fun pushing this feature around with three or four spoken languages to find its limits.
Add custom words
This is what I have wanted for years. Custom audio recognition vocabulary – words and phrases – to ensure that unusual or specialist terms are recognized and transcribed correctly. BINGO!
On-device processing
All audio processing will be handled locally (on your iPhone or iPad), ensuring privacy if you believe the NSA and/or the Russians or other parties aren't tapped into your equipment.
Enhancements to voice editing and voice-driven app control
There are a lot of these. Read about them in the Accessibility section of the features description from Apple. My first impression of these possibilities is that editing and correcting text may become much easier on iOS devices, and the attractiveness of the three-stage dictation/alignment/pretranslation workflow may increase for some translators. (An old example of this is in an old YouTube video I prepared years ago for a remote conference presentation, but the procedure works with any speech-to-text options and has the advantage of at least two revision steps.)
It's even more interesting to consider how some of these new features might be harnessed by apps designed to work with translation assistance environments. And if Google responds - as I believe the company is likely to do - with new features for Chrome speech recognition and voice control features in Android and desktop computers, then there could be some very, very interesting things ahead for wordworkers in the next year or two. Vamos ver!
Jun 4, 2019
Best Practices in Translation Technology - summer school in Lisbon (July 2019)
UPDATE: The course registration deadline is Sunday, June 23rd!
Once again this year, I'll be team teaching a course with David Hardisty and Marco Neves (in English, open to all, with enrollment limited) at Universidade Nova de Lisboa (the New University in Lisbon, Portugal) on topics for best practice applying technology for more efficient and effective translation processes.
Details for costs (about €150 for those not enrolled at the university) are available here in English; the course instructors will assist those who cannot read Portuguese registration pages to register and handle other details as needed.
This year's topics include:
There are some unannounced extras, but those will remain secret for now :-) And as time permits, individual challenges of course participants can also be addressed by the experts leading the course.
This is about as good as it gets for training costs; the university tuition is ridiculously low, and the 25 hours of instruction during the week of 15 to 20 July cost less than most half-day workshops while delivering far more. If it were up to me, I would probably increase the cost ten-fold, but that's because I'm a practical business person who understands commercial value, not a university administrator :-)
The course requires a basic knowledge of memoQ in advance, but much of the material goes well beyond that CAT tool. As always, integrated work with other environments, such as SDL Trados Studio and Wordfast is taught and emphasized.
The complete cost information (in Portuguese) is here: http://fcsh.unl.pt/formacao-ao-longo-da-vida/escola-de-verao/inscricoes
The direct registration page in Portuguese (where you will need to mark "Boas Práticas de Tecnologia para Tradução/Best Practices in Translation Technology" at the bottom) is here: http://www.fcsh.unl.pt/formacao-ao-longo-da-vida/escola-de-verao/inscricoes/escola-de-verao
We hope you can join us!
Once again this year, I'll be team teaching a course with David Hardisty and Marco Neves (in English, open to all, with enrollment limited) at Universidade Nova de Lisboa (the New University in Lisbon, Portugal) on topics for best practice applying technology for more efficient and effective translation processes.
![]() |
Click the graphic above to go to the Portuguese information page! |
This year's topics include:
– Good translation workflows.
– Using voice recognition in translation.
– Using machine translation in a humane, intelligent way.
– Using checklists to improve communication in translation.
– Using glossaries, bilingual texts and other references in multiplatform environments.
– Good practices for using terminology and reference texts in the target language.
– Planning and creating lists for "autotranslatables" and the basics of "regular expressions" for filters.
There are some unannounced extras, but those will remain secret for now :-) And as time permits, individual challenges of course participants can also be addressed by the experts leading the course.
This is about as good as it gets for training costs; the university tuition is ridiculously low, and the 25 hours of instruction during the week of 15 to 20 July cost less than most half-day workshops while delivering far more. If it were up to me, I would probably increase the cost ten-fold, but that's because I'm a practical business person who understands commercial value, not a university administrator :-)
The course requires a basic knowledge of memoQ in advance, but much of the material goes well beyond that CAT tool. As always, integrated work with other environments, such as SDL Trados Studio and Wordfast is taught and emphasized.
The complete cost information (in Portuguese) is here: http://fcsh.unl.pt/formacao-ao-longo-da-vida/escola-de-verao/inscricoes
The direct registration page in Portuguese (where you will need to mark "Boas Práticas de Tecnologia para Tradução/Best Practices in Translation Technology" at the bottom) is here: http://www.fcsh.unl.pt/formacao-ao-longo-da-vida/escola-de-verao/inscricoes/escola-de-verao
We hope you can join us!
Mar 4, 2019
What evil lurks in the results from your language service provider?
Let me start by disclosing that although I have a registered limited company through which I provide translation, training and technical consulting services for translation processes, I am essentially a sole trader who is not unreasonably, though not correctly, referred to as a freelancer much of the time. I have a long history of friendship and consulting support with the honorable owners of quite a few small and medium language service companies and of a few large ones. I vigorously dispute any foolish claims that there is no such need for such companies, and I see a natural alliance and many shared interests between the best of them and the best of independent professionals in the same sector.
But as Sturgeon's Law states so well, "ninety percent of everything is crap", and that would apply in equal measure to translation brokers and translators I suspect, though of course this is influenced by context. But what context can justify this translation of a data privacy statement from German to English? Only the section headers are shown here to protect against sensory overload and blown mental circuits:
The rest of the text is actually worse. This is the kind of thing some unscrupulous agencies take money for these days.
Why, pray tell, was the section numbering translated so variously into English? Well, if you know anything about the mix-and-match statistical crapshoot that is SMpT (statistical machine pseudotranslation) and its not-as-good-as-you-think wannabe alternatives, it's easy to guess the frequency of certain correlations in English with German numbers followed by a period.
And clearly, the agency could not even be bothered to make corrections, and the robotic webmaster put the text up, noticing nothing, where it remained for about a year to embarrass a rather good company which I hold in high esteem.
What's the moral of this story? Take your pick from the many reasonable options. "Reasonable" does not include doing business with the liars and thieves who will try to sell you on the "value proposition" of machine translation to cut costs.
A skilled translator knowledgeable in the subject matter and trained in dictation techniques paired with a good speech recognition solution or transcriptionist can beat any human post-edited machine translation process for both volume and quality. And a skilled summarizer reading source texts and dictating summaries in another language can blow them both away as a "value proposition".
One thing that is too often forgotten in the fool's gold rush to cheap language (dis)service solutions is - as noted by Bevan et alia - exposure to machine-translated output over any significant period of time has unfortunate effects one the language skills (reading, writing and comprehension) of the victims working with it. This has been confirmed time and again by translation company owners, slavelancers and other word workers. Serious occupational health measures are called for, but to date little or nothing has been done in this regard.
And when human intelligence is taken out of play or impaired by an automated linguistic lobotomy, the results inevitable gall in the lower quartile of the aforementioned 90%. Really crappy crap.
As another of my favorite fiction authors used to comment: TANSTAAFL. There ain't no such thing as a free lunch. And trust is always good, but these days you need to verify that your service providers really give you what you have paid for and don't pass off crap like you see in the example above.
But as Sturgeon's Law states so well, "ninety percent of everything is crap", and that would apply in equal measure to translation brokers and translators I suspect, though of course this is influenced by context. But what context can justify this translation of a data privacy statement from German to English? Only the section headers are shown here to protect against sensory overload and blown mental circuits:
The rest of the text is actually worse. This is the kind of thing some unscrupulous agencies take money for these days.
Why, pray tell, was the section numbering translated so variously into English? Well, if you know anything about the mix-and-match statistical crapshoot that is SMpT (statistical machine pseudotranslation) and its not-as-good-as-you-think wannabe alternatives, it's easy to guess the frequency of certain correlations in English with German numbers followed by a period.
And clearly, the agency could not even be bothered to make corrections, and the robotic webmaster put the text up, noticing nothing, where it remained for about a year to embarrass a rather good company which I hold in high esteem.
What's the moral of this story? Take your pick from the many reasonable options. "Reasonable" does not include doing business with the liars and thieves who will try to sell you on the "value proposition" of machine translation to cut costs.
A skilled translator knowledgeable in the subject matter and trained in dictation techniques paired with a good speech recognition solution or transcriptionist can beat any human post-edited machine translation process for both volume and quality. And a skilled summarizer reading source texts and dictating summaries in another language can blow them both away as a "value proposition".
One thing that is too often forgotten in the fool's gold rush to cheap language (dis)service solutions is - as noted by Bevan et alia - exposure to machine-translated output over any significant period of time has unfortunate effects one the language skills (reading, writing and comprehension) of the victims working with it. This has been confirmed time and again by translation company owners, slavelancers and other word workers. Serious occupational health measures are called for, but to date little or nothing has been done in this regard.
And when human intelligence is taken out of play or impaired by an automated linguistic lobotomy, the results inevitable gall in the lower quartile of the aforementioned 90%. Really crappy crap.
As another of my favorite fiction authors used to comment: TANSTAAFL. There ain't no such thing as a free lunch. And trust is always good, but these days you need to verify that your service providers really give you what you have paid for and don't pass off crap like you see in the example above.
Feb 4, 2019
Review: the Plantronics Voyager Legend monoaural headset for translation
Ergonomics are often a challenge with the microphones used for dictated translation work. I've used quite a few over the years, usually with USB connections to my computer, though I've also had a few Logitech wireless headphones with integrated mikes that performed well. However, all of them have had some disadvantages.
The country where I live (Portugal) has a rather warm climate for more than a few months of the year. Wearing headphones can get rather uncomfortable on a hot day, and even on a cold one, the pressure on my ears starts to drive me nuts after an hour or so.
Desktop microphones seem like a good solution, and I get good results with my Blue Yeti. But sometimes, when I turn my head to look at something, the pickup is not so good, and my dictation is transcribed incorrectly.
The Hey memoQ app released by memoQ Translation Technologies Ltd. underscored the ergonomic challenges of dictation for me; the app uses iOS devices to access their speech recognition features, and positioning a phone well in such a way that one can still make use of a keyboard is not easy. And trying to connect a microphone or headset by cable to the dodgy Lightning port on my iPhone 7 is usually not a good experience.
So I was intrigued by a recommendation of Plantronics headsets from Dragos Ciobanu of Leeds University (also the author of the E-learning Bakery blog). A specific model mentioned by someone who had attended a dictation workshop with him recently was the Plantronics Voyager Legend, though when I asked Dragos about his experience, he spoke mostly about the Plantronics Voyager 5200, which is a little more expensive. I decided to go "cheap" for my first experience with this sort of equipment and ordered the Voyager Legend from Amazon in Spain. I did so with some trepidation, because the reviews I read were not entirely positive.
The product arrived in simple packaging which led me to think that the Amazon review which suggested the "new" products sold might in fact be refurbished. But in the EU, all electronic gear comes with a two-year warranty, so I don't worry too much about that.
Complaints I read in the reviews about a short charger cable seem ridiculous; the cable I received was over half a meter long, and like anyone who has computers these days, I have more USB extension cords than I know what to do with should I require a longer cable for charging. The magnetic coupler for charging has a female mini-USB port, so it can be attached to another cable as well. Power connections include the most common EU two-pronged charger, the 3-pole UK charger and one for a car's cigarette lighter.
The package also included extra earpieces and covers of different sizes to customize the fit on one's ear.
I tested the microphone first with my laptop; the device was recognized easily, and the results with Dragon NaturallySpeaking were excellent. Getting the connection to my iPhone 7 proved more difficult, however. I read the Getting Started instructions carefully, tried updating the firmware (not necessary - everything was current) and tried various switching and reboot tricks, all to no avail.
Finally, I called the technical support line in the US in total frustration. I didn't expect an answer since it was still the wee hours of the morning in the US, but someone at a support call center did answer the phone. He instructed me to press and hold the "call" button on the device until its LED begins to flash blue and red.
I did that, and when the LED began flashing, "PLT_Legend" appeared in the list of available devices on my iPhone. Then I was ready to test the Voyager Legend for dictated translation with Hey memoQ.
Because I work with German and English, I rely on Dragon NaturallySpeaking for my dictation, and the iOS-based dictation of Hey memoQ will never compete with that. But I am very interested in testing and demonstrating the integrated memoQ app, because many other languages, such as Portuguese, are not available for speech recognition in Dragon NaturallySpeaking or any other readily accessible speech recognition solution of its class.
As I suspected, my dictation in Hey memoQ (and other iOS applications) was easier with the Voyager Legend. This is the first hardware configuration I have tested that really seems like it would offer acceptable ergonomics for Hey memoQ with my phone. And I can use it for Skype calls, listening to my audio books and other things, so I consider the Plantronics Voyager Legend to be money well spent. Now I'll see how it holds up for long sessions of dictated legal translation. The product literature and a little voice in my ear both claim that the device can operate for seven hours of speaking time on a battery charge, and the 90 minutes required for a full recharge will work well enough with the breaks I take in that time anyway.
Of course there are many Bluetooth microphone devices which can be used with speech recognition applications, but what distinguishes this one is its great comfort of wear and the secure fit on my ear. I look forward to a closer acquaintance.
The country where I live (Portugal) has a rather warm climate for more than a few months of the year. Wearing headphones can get rather uncomfortable on a hot day, and even on a cold one, the pressure on my ears starts to drive me nuts after an hour or so.
Desktop microphones seem like a good solution, and I get good results with my Blue Yeti. But sometimes, when I turn my head to look at something, the pickup is not so good, and my dictation is transcribed incorrectly.
The Hey memoQ app released by memoQ Translation Technologies Ltd. underscored the ergonomic challenges of dictation for me; the app uses iOS devices to access their speech recognition features, and positioning a phone well in such a way that one can still make use of a keyboard is not easy. And trying to connect a microphone or headset by cable to the dodgy Lightning port on my iPhone 7 is usually not a good experience.
So I was intrigued by a recommendation of Plantronics headsets from Dragos Ciobanu of Leeds University (also the author of the E-learning Bakery blog). A specific model mentioned by someone who had attended a dictation workshop with him recently was the Plantronics Voyager Legend, though when I asked Dragos about his experience, he spoke mostly about the Plantronics Voyager 5200, which is a little more expensive. I decided to go "cheap" for my first experience with this sort of equipment and ordered the Voyager Legend from Amazon in Spain. I did so with some trepidation, because the reviews I read were not entirely positive.
The product arrived in simple packaging which led me to think that the Amazon review which suggested the "new" products sold might in fact be refurbished. But in the EU, all electronic gear comes with a two-year warranty, so I don't worry too much about that.
Complaints I read in the reviews about a short charger cable seem ridiculous; the cable I received was over half a meter long, and like anyone who has computers these days, I have more USB extension cords than I know what to do with should I require a longer cable for charging. The magnetic coupler for charging has a female mini-USB port, so it can be attached to another cable as well. Power connections include the most common EU two-pronged charger, the 3-pole UK charger and one for a car's cigarette lighter.
The package also included extra earpieces and covers of different sizes to customize the fit on one's ear.
I tested the microphone first with my laptop; the device was recognized easily, and the results with Dragon NaturallySpeaking were excellent. Getting the connection to my iPhone 7 proved more difficult, however. I read the Getting Started instructions carefully, tried updating the firmware (not necessary - everything was current) and tried various switching and reboot tricks, all to no avail.
Finally, I called the technical support line in the US in total frustration. I didn't expect an answer since it was still the wee hours of the morning in the US, but someone at a support call center did answer the phone. He instructed me to press and hold the "call" button on the device until its LED begins to flash blue and red.
I did that, and when the LED began flashing, "PLT_Legend" appeared in the list of available devices on my iPhone. Then I was ready to test the Voyager Legend for dictated translation with Hey memoQ.
Because I work with German and English, I rely on Dragon NaturallySpeaking for my dictation, and the iOS-based dictation of Hey memoQ will never compete with that. But I am very interested in testing and demonstrating the integrated memoQ app, because many other languages, such as Portuguese, are not available for speech recognition in Dragon NaturallySpeaking or any other readily accessible speech recognition solution of its class.
As I suspected, my dictation in Hey memoQ (and other iOS applications) was easier with the Voyager Legend. This is the first hardware configuration I have tested that really seems like it would offer acceptable ergonomics for Hey memoQ with my phone. And I can use it for Skype calls, listening to my audio books and other things, so I consider the Plantronics Voyager Legend to be money well spent. Now I'll see how it holds up for long sessions of dictated legal translation. The product literature and a little voice in my ear both claim that the device can operate for seven hours of speaking time on a battery charge, and the 90 minutes required for a full recharge will work well enough with the breaks I take in that time anyway.
Of course there are many Bluetooth microphone devices which can be used with speech recognition applications, but what distinguishes this one is its great comfort of wear and the secure fit on my ear. I look forward to a closer acquaintance.
Jan 3, 2019
Using analog microphones with newer iPhones
Microphone quality makes a great difference in the quality of speech recognition results. And although the microphones integrated in iOS devices are generally good and give decent results, positioning the device in a way that is ergonomically viable for efficient dictated translation - and concurrent keyboard use - is not always so easy. This is a potential barrier to the effective use of Hey memoQ speech recognition.
So a good external microphone may be needed. But with recent iPhone models lacking a separate microphone jack and using the lightning port for both charging and microphone input, connecting that external microphone might not be as simple as one assumes. Especially not someone like me, who is rather ignorant of the different kinds of 3.5 mm audio connections. I have had a few failures so far trying to link my good headset to the iPhone 7.
Colleague Jim Wardell is not only the most experienced speech recognition expert for translation whom I am privileged to know; he is also a musician with extensive experience in microphones of all kinds and their connections. And recently he was kind enough to share the video below with me to clear up some misunderstandings about how to connect some good analog equipment to use with Hey memoQ on an iPhone 7 or later:
Dec 22, 2018
Deutsche Kommandos für „Hey memoQ“
Mit der memoQ Version 8.7 hat memoQ Translation Technologies Ltd. (ehemals „Kilgray“ – „mQtech“ unten) eine kostenlose, integrierte Spracherkennung eingeführt, die für die Arbeit in vielen Sprachen eine wesentliche Effizienzsteigerung verspricht. Für die Arbeit in deutscher Sprache soll es von vornherein klar gestellt werden: Dragon NaturallySpeaking (DNS) ist und bleibt für die vorhersehbare Zeit die bessere Wahl. Das gleiche gilt für alle Sprachen, die von DNS (in der aktuellen Version 15) unterstützt sind: Deutsch, Englisch, Spanisch, Französisch, Italienisch und Niederländisch.
Aber für die slawischen Sprachen, nordischen Sprachen, sonstigen romanischen Sprachen, Arabisch u.v.m. sind andere Lösungen gefragt, wenn man mit Spracherkennung arbeiten will. Vor etwa 4 Jahren, als ich angefangen habe, solche Lösungen zu erforschen, waren diese für „exotische“ Sprachen wie Russisch oder europäisches Portugiesisch als Teil der Übersetzungsarbeiten kaum gedacht; heute gibt es vielfältige halbgute Möglichkeiten, zu denen jetzt auch „Hey memoQ“ gehört. Noch warten wir auf Lösungen auf der Ebene von DNS für die sonstigen Sprachen und noch lange werden wir sicher warten, bis gute Erkennungsqualität mit einfach erweiterbarem Wortschatz und flexiblen, konfigurierbaren Kommandos für die Systemsteuerung für Sprachen wie Dänisch oder Hindi allgemein verfügbar sind. Zur Zeit sind wir nicht mal so weit mit Englisch, wenn man z.B. die Diktierfunktion auf Handys betrachtet. Spracherkennung ohne eigenständig erweiterbarem Wortschatz ist und bleibt eine Technologie auf Krücken.
Aber die Krücken bei Hey memoQ sind erstmal nicht schlecht für eine aufkommende Technologie. Die mit der 8.7er Version von memoQ freigegebene App ist m.E. noch „Beta“ – was kann man sonst sagen, wenn nur für Englisch die Steuerungskommandos standardmäßig konfiguriert sind? – aber für den Stand der derzeit zahlbaren Technologie ist die von mQtech eingeführte Lösung die beste in der Klasse, sogar mit einem tauglichen Umgehungslösung für das Problem des nichterweiterbaren Wortschatzes, nämlich die Möglichkeit, sprachgesteuert die ersten neuen Treffer aus der Ergebnisliste der Terminologie, Korporasuche, Nontranslatables usw. in den Zieltext einzufügen. Wenn man sowieso vernünftige Terminologiearbeit leistet und ein memoQ-Glossar mit den nötigen Sonderbegriffen ausstattet, kann man schon ziemlich gut arbeiten. (Und wer eventuell eine Einweisung in die statistisch basierte Erfassung der häufigen Begriffe aus einem Dokument bzw. einer Dokumentensammlung benötigt, kann sich hier informieren.)
Hey memoQ hat auch andere Alleinstellungsmerkmale, u.a. einen Wechsel der Erkennungssprache, wenn man den Cursor im Textfeld für die andere Arbeitssprache setzt. Also wenn ich z.B. Englisch als Zieltext diktiere, will aber einen Tippfehler im deutschen Ausgangstext korrigieren oder vielleicht den gesamten Text nach einem bestimmten Wort im Ausgangstext filtrieren, wechselt die von Hey memoQ verstandene Sprache von Englisch auf Deutsch, wenn ich bloß auf Zieltextseite klicke. So geht das auch bei jedem unterstützten Sprachpaar. Nicht schlecht.
Wer bereits meckert, dass diese derzeit auf Apple iOS basierende Lösung nicht für die beliebten Android-Handys verfügbar ist, begreift die Realität der Softwareentwicklung bzw. Produktentwicklung einfach nicht. Schon vor mQtech mit der Entwicklung dieser Lösung begonnen hat, habe ich selber aus persönlichem Anlass die möglichen Application Programming Interfaces (APIs) untersucht, und bei den meisten war die Kommandosteuerung, wie sie bei Hey memoQ zu finden ist, nicht verfügbar. In den meisten Fällen nur die Übertragung eines gesprochenen und transkribierten Textes. Aber das hat wir bereits. Bei myEcho zum Beispiel. Oder auch die Lösung für Chrome-Spracherkennung in jedem Windows- oder Linux-Programm. Was wir dringend brauchen ist nicht das Bier von gestern. Wir brauchen zukunftsweisende Prototypen, die die Entwicklung der branchenüblichen Technologien wie memoQ, SDL Trados Studio, WordFast und andere in eine bessere Richtung treiben, und das macht schon Hey memoQ. Also ein dickes Lob an das memoQ-Team und seinen deutschen Entwicklungschef :-)
Aber auch mit einem deutschen Entwicklungschef, ist der Zeitdruck manchmal so, dass man vorläufig keine konfigurierten Steuerungskommandos mit der ersten Release-Version freigibt, wahrscheinlich weil das eigentlich aufwändiger ist, als die meisten Leute sich glauben würden. In jeder Sprache. Wer zum Beispiel Polnisch diktieren will und nicht nur die gesprochenen Phrasen ins Textfeld transkribiert haben will, sondern auch sprachgesteuert den Text editieren oder Filterkommandos oder Konkordanzsuche ausführen will, muss erstmal polnische Kommandos im Programm einrichten. Und da stoßt man oft unerwartet an die Grenzen und Merkwürdigkeiten der individuellen Erkennungstechnologie. Eine gewählte Phrase kann, zum Beispiel, einer sehr häufigen Phrase ähneln, so dass oft diesen anderen Text geschrieben wird, wenn man eigentlich ein Kommando ausführen lassen wollte. Also sind ungewöhnliche aber erkennbare Texte oft die beste Wahl für Kommandotexte. Meine erprobten Kommandotexte für Deutsch sind unten als Screenshot angegeben. Wie man gleich merkt, ist das zu konfiguriende Dialog noch nicht für die deutsche Benutzeroberfläche lokalisiert. In kommenden Versionen wird das natürlich der Fall sein. Aber ob irgendwann aus Ungarn die Bearbeitungskommandos für Griechisch vorkonfiguriert kommen werden, kann ich nicht raten. Selber konfigurieren kann man sie aber heute schon, wenn man Geduld hat.
Noch zu bemerken: die iOS-Spracherkennung benötigt gute Internet-Bandbreite, da der Erkennungsserver im Cloud liegt. Datenschutz, Datenschutz, ja, ja. Sparen Sie mir den Vortrag bitte und lassen sie diese Technologie sich erstmal weiter entwickeln. Die Fragen zum Datenschutz waren schon vor einigen Jahren ausreichend von deutschen Vertretern der Firma Nuance beantwortet, und sogar die verrückten US-Behörden haben den Einsatz solcher Technologie intern freigegeben. Aber in Deutschland dreht sich die Welt anders, und gut so :-) Übrigens erlebe ich mehr Erfolg, wenn ich in kurzen, sogar dramatischen Phrasen spreche, und nicht in langen, wortreichen Sätzen. Eine ganz andere notwendige Vorgehensweise als mit DNS, zum Beispiel. Wer zu schnell spricht, merkt auch schnell, dass Wörter ausgelassen werden. Nichts mit Hey memoQ zu tun, sondern Bestandteil des Standes der Technik bei iOS-Spracherkennung sowie bei manchen anderen Technologien dieser Gattung.
Und jetzt die Ansicht meiner selbstkonfigurierten Hey memoQ Steuerungskommandos für Deutsch. Wem sich meine Wortwahl nicht gefällt, kann sich was Besseres aussuchen und hoffentlich testen und danach in den Kommentaren unten allen deutschsprachigen Kollegen mitteilen.
Die iOS-Kommandos für Interpunktion u.v.m. habe ich auf Basis der von Apple publizierten MacOS-Kommandos erforscht; es gibt in einzelnen Fällen leichte Unterschiede (d.h. man muss ein wenig experimentieren, bis man auf das richtige Kommando stoßt - falls es tatsächlich existiert), aber hiermit hat man einen guten Anfang für Sonderzeichen usw. wie ich neulich in einem englischen Blogbeitrag erklärt habe. Für fehlende Informationen kann man mQtech keine Schuld zuweisen, wenn nicht mal der iOS-Hersteller Apple die vollständige und richtige Liste mitteilt. Aber mit viel Zeit wird der Kuchen sicher gut gebacken!
Dec 11, 2018
Your language in Hey memoQ: recognition information for speech
There are quite a number of issues facing memoQ users who wish to make use of the new speech recognition feature – Hey memoQ – released recently with memoQ version 8.7. Some of these are of a temporary nature (workarounds and efforts to deal with bugs or shortcomings in the current release which can reasonably be expected to change soon), others – like basic information on commands for iOS dictation and what options have been implemented for your language – might not be so easy to work out. My own research in this area for English, German and Portuguese has revealed a lot of errors in some of the information sources, so often I have to take what I find and try it out in chat dictation, e-mail messages or the Notes app (my favorite record-keeping tool for such things) on the iOS device. This is the "baseline" for evaluating how Hey memoQ should transcribe text in a given language.
But where do you find this information? One of the best way might be a Google Advanced Search on Apple's support site. Like this one, for example:
The same search (or another) can be made by adding the site specification after your search terms in an ordinary Google search:
The results lists from these searches reveal quite a number of relevant articles about iOS dictation in English. And by hacking the URLs on certain pages and substituting the language code desired, one can get to the information page on commands available for that language. Examples include:
All the same page, with slightly modified URLs.
The Mac OS information pages are also a source of information on possible iOS commands that one might not find so easily otherwise. An English page with a lot of information on punctaution and symbols is here: https://support.apple.com/en-us/HT202584
The same information (if available) for other languages is found just by tweaking the URL:
But where do you find this information? One of the best way might be a Google Advanced Search on Apple's support site. Like this one, for example:
The same search (or another) can be made by adding the site specification after your search terms in an ordinary Google search:
The results lists from these searches reveal quite a number of relevant articles about iOS dictation in English. And by hacking the URLs on certain pages and substituting the language code desired, one can get to the information page on commands available for that language. Examples include:
All the same page, with slightly modified URLs.
The Mac OS information pages are also a source of information on possible iOS commands that one might not find so easily otherwise. An English page with a lot of information on punctaution and symbols is here: https://support.apple.com/en-us/HT202584
The same information (if available) for other languages is found just by tweaking the URL:
- German (de-de)
- Portuguese (pt-pt, there is also a pt-br page, but I haven't read both to check differences)
- Polish (pl-pl)
- Norwegian (that's a no-no)
- Arabic (ar-ae)
- Turkish (tr-tr)
- French (fr-fr)
- Thai (th-th, see also this commentary on another site)
and so on. Some guidance on Apple's choice of codes for language variants is here, but I often end up getting to where I want to go by guesswork. The Microsoft Azure page for speech API support might be more helpful to figure out how to tweak the Apple Support URLs.
When you edit the commands list, you should be aware of a few things to avoid errors.
- The current command lists in the first release may contain errors, such as mistakenly typing "phrase" in angular brackets as shown in the first example above; on editing, the commands that are followed by a phrase do not show the placeholder for that phrase, as you see in the example marked "2".
- Commands must be entered without quotation marks! Compare the marked examples 1 and 2 above. If quotes are typed when editing a command, this will not be revealed by the appearance of the command; it will look OK but won't work at all until the quote marks are removed by editing.
- Command creation is an iterative process that may entail a lot of frustrating failures. When I created my German command set, I started by copying some commands used for editing by Dragon NaturallySpeaking, but often the results were better if I chose other words. Sometimes iOS stubbornly insists on transcribing some other common expression, sometimes it just insists on interpreting your command as a word to transcribe. Just be patient and try something else.
At the present stage, I see the need for developing and/or fixing the Hey memoQ app in the following ways:
- Fix obvious bugs, which include:
- The apparently non-functional concordance insertions. In general, more voice control would be helpful in the memoQ Concordance.
- Capitalization errors which may affect a variety of commands, like Roman numerals, ALL CAPS, title capitalization (if the first word of the title is not at the start of the segment), etc.
- Dodgy responses to the commands to insert spaces, where it is often necessary to say the command twice and get stuck with two spaces, because a single command never responds properly by inserting a space. Why is that needed? Well, otherwise you have to type a space on the keyboard if you are going to use a Translation Results insertion command to insert specialized terminology, auto-translation rule results, etc. into your text.
- Address some potentially complicated issues, like considering what to do about source language text handling if there is no iOS support for the source language or the translator cannot dictate commands effectively in that language. I can manage in German or Portuguese, but I would be really screwed these days if I had to give commands in Russian or Japanese.
- Expand dictation functionality in environments like the QA resolution lists, term entry dialog, alignment editor and other editors.
- Look for simple ideas that could maximize returns for programming effort invested, like the "Press" command in Dragon NaturallySpeaking, which enables me to insert tags, for example, by saying "Press F9". This would eliminate the need for some commands (like confirmation and all the Translation Results insertion commands) and open up a host of possibilities by making keyboard shortcuts in any context controllable by voice. I've been thinking a lot about that since talking to a colleague with some pretty tough physical disabilities recently.
Overall, I think that Hey memoQ represents a great start in making speech recognition available in a useful way in a desktop translation environment tool and making the case for more extensive investments in speech recognition technology to improve accessibility and ergonomics for working translators.
Of course, speech recognition brings with it a number of different challenges for reviewing work: mistakes (or "dictos" as they are sometimes called, a riff on keyboard "typos") are often harder to catch, especially if one is reviewing directly after translating and the memory of intended text is perhaps fresh enough to override in perception what the eye actually sees. So maybe before long we'll see an integrated read-back feature in memoQ, which could also benefit people who don't work with speech recognition.
Since I began using speech recognition a lot for my work (to cope with occasionally unbearable pain from gout), I have had to adopt the habit of reading everything out loud after I translate, because I have found this to be the best way to catch my errors or to recognize where the text could use a rhetorical makeover. (The read-back function of Dragon NaturallySpeaking in English is a nightmare, randomly confusing definite and indefinite articles, but other tools might be usable now for external review and should probably be applied to target columns in an exported RTF bilingual file to facilitate re-import of corrections to the memoQ environment, though the monolingual review feature for importing edited target text files and keeping project resources up-to-date is also a good option.)
As I have worked with the first release of Hey memoQ, I have noticed quite a few little details where small refinements or extensions to the app could help my workflow. And the same will be true, I am sure, with most others who use this tool. It is particularly important at this stage that those of us who are using and/or testing this early version communicate with the development team (in the form of e-mail to memoQ Support - support@memoq.com - with suggestions or observations). This will be the fastest way to see improvements I think.
In the future, I would be surprised if applications like this did not develop to cover other input methods (besides an iOS device like an iPhone or iPad). But I think it's important to focus on taking this initial platform as far as it can go so that we can all see the working functionality that is missing, so that as the APIs for relevant operating systems develop further to support speech recognition (especially the Holy Grail for many of us, trainable vocabulary like we have in Dragon NaturallySpeaking and a very few other applications). Some of what we are looking for may be in the Nuance software development kits (SDKs) for speech recognition, which I suggested using some years ago because they offer customizable vocabularies at higher levels of licensing, but this would represent a much greater and more speculative investment in an area of technology that is still subject to a lot of misunderstanding and misrepresentation.
Dec 10, 2018
"Hey memoQ" command tests
In my last post on the release of memoQ 8.7 with its new, integrated speech recognition feature I included a link to a long, boring video record of my first tests of the speech recognition facility, most of which consisted of testing various spoken iOS commands to generate text symbols, change capitalization, etc. I tested some of the integrated commands that are specific to memoQ, but not in an organized way really.
In a new testing video, I attempt to show all the memoQ-specific spoken command types and how the commands are affected by the environment (in this case I mean whether the cursor is on the target text side or the source text side or in some other place in the concordance, for example).
Most of the spoken commands work rather well, except for insertion from the concordance, which I could not get to work at all. When the cursor is in a source text cell, commands have to be given in the source text language currently, which is sure to prove interesting for people who don't speak their source language with a clean accent. Right now it's even more interesting, because English is the only language with a ready-made command list; other languages have to "roll their own" for now, which is a bit of a trial-and-error thing. I don't even want to think how this is going to work if the source language isn't supported at all; I think some thought had to be given to how to use commands with source text. I assume if it's copied to the target side it will be difficult to select unless, with butchered pronunciation, the text also happens to make sense in the target language.
It's best to watch this video on YouTube (start it, then click "YouTube" at the bottom of the running video). There you'll find a time code index in the description (after you click SEE MORE) which will enable you to navigate to specific commands or other things shown in the test video.
My ongoing work with Hey memoQ make it clear that what I call "mixed mode" (dictation with concurrent use of the keyboard) is the best and (actually) necessary way to use this feature. The style for successful dictation is also quite different than the style I need to use with Dragon NaturallySpeaking for best results. I have to discipline myself to speak more in short phrases, less in longer ones, much less in long sentences, which may cause some text to be dropped.
There is also an issue with Translation Results insertions and the lack of spaces before them; the command to insert a space ("spacebar" in English) is dodgy, so I usually have to speak it twice and end up with a superfluous space. The video shows my workaround for this in one part: I speak a filler word (in one case I tried "dummy" which was rendered as "dumb he") and then select it later and insert an entry from the Translation Results pane over the selected text. This is in fact how we can deal with specialist terminology not recognized by the current speech dictionary until it becomes possible to train new words some day.
The sound in the video (spoken commands) is also of variable quality; with some commands I had to turn my head toward the iPhone on its little tripod next to my laptop, which caused the pickup of that speech to be bad on the built-in microphone on the laptop's screen. So this isn't a Hollywood-class recording; it's simply a slightly edited record of some of my tests to give other memoQ users some idea of what they can expect from the feature right now.
Those who will be dictating in supported languages other than English need some patience right now. It's not always easy coming up with commands that will be recognized easily but which are unlikely to occur as words to be transcribed in typical dictation work. During the beta test of Hey memoQ I used some bizarre and unusual German words which just happened to be recognized. I'm developing a set of more normal-sounding commands right now, but it's a work in progress.
The difficulties I am encountering making up new command phrases (or changing the English ones in some cases) simply reinforce my belief that these command lists should be made into portable light resources as soon as possible.
I am organizing summary tables of the memoQ-specific commands and useful iOS commands for symbols, capitals, spacing, etc. comparing their performance in other iOS apps with what we see right now in Hey memoQ.
Update: the summary file for English is available here. I will post links here for any other languages I can prepare later.
In a new testing video, I attempt to show all the memoQ-specific spoken command types and how the commands are affected by the environment (in this case I mean whether the cursor is on the target text side or the source text side or in some other place in the concordance, for example).
Most of the spoken commands work rather well, except for insertion from the concordance, which I could not get to work at all. When the cursor is in a source text cell, commands have to be given in the source text language currently, which is sure to prove interesting for people who don't speak their source language with a clean accent. Right now it's even more interesting, because English is the only language with a ready-made command list; other languages have to "roll their own" for now, which is a bit of a trial-and-error thing. I don't even want to think how this is going to work if the source language isn't supported at all; I think some thought had to be given to how to use commands with source text. I assume if it's copied to the target side it will be difficult to select unless, with butchered pronunciation, the text also happens to make sense in the target language.
It's best to watch this video on YouTube (start it, then click "YouTube" at the bottom of the running video). There you'll find a time code index in the description (after you click SEE MORE) which will enable you to navigate to specific commands or other things shown in the test video.
My ongoing work with Hey memoQ make it clear that what I call "mixed mode" (dictation with concurrent use of the keyboard) is the best and (actually) necessary way to use this feature. The style for successful dictation is also quite different than the style I need to use with Dragon NaturallySpeaking for best results. I have to discipline myself to speak more in short phrases, less in longer ones, much less in long sentences, which may cause some text to be dropped.
There is also an issue with Translation Results insertions and the lack of spaces before them; the command to insert a space ("spacebar" in English) is dodgy, so I usually have to speak it twice and end up with a superfluous space. The video shows my workaround for this in one part: I speak a filler word (in one case I tried "dummy" which was rendered as "dumb he") and then select it later and insert an entry from the Translation Results pane over the selected text. This is in fact how we can deal with specialist terminology not recognized by the current speech dictionary until it becomes possible to train new words some day.
The sound in the video (spoken commands) is also of variable quality; with some commands I had to turn my head toward the iPhone on its little tripod next to my laptop, which caused the pickup of that speech to be bad on the built-in microphone on the laptop's screen. So this isn't a Hollywood-class recording; it's simply a slightly edited record of some of my tests to give other memoQ users some idea of what they can expect from the feature right now.
Those who will be dictating in supported languages other than English need some patience right now. It's not always easy coming up with commands that will be recognized easily but which are unlikely to occur as words to be transcribed in typical dictation work. During the beta test of Hey memoQ I used some bizarre and unusual German words which just happened to be recognized. I'm developing a set of more normal-sounding commands right now, but it's a work in progress.
The difficulties I am encountering making up new command phrases (or changing the English ones in some cases) simply reinforce my belief that these command lists should be made into portable light resources as soon as possible.
I am organizing summary tables of the memoQ-specific commands and useful iOS commands for symbols, capitals, spacing, etc. comparing their performance in other iOS apps with what we see right now in Hey memoQ.
Update: the summary file for English is available here. I will post links here for any other languages I can prepare later.
Dec 7, 2018
Integrated iOS speech recognition in memoQ 8.7
Today, memoQ Translation Technologies (the artists formerly known as "Kilgray") officially released their iOS dictation app along with memoQ version 8.7, making that popular translation environment tool the first on the desktop to offer free integrated speech recognition and control.
The initial release only has a full set of commands implemented in English. Those who want to use control commands for navigating, selecting, inserting, etc. will have to enter there own localized commands for now, and this too involves some trial and error to come up with a good working set. And I hope that before long the development team will implement the language-specific command sets as a shareable light resources. That will make it much easier to get all the available languages sorted out properly for productive work.
My initial tests of the release version are encouraging. Some bugs with capitalization which I identified with the beta test haven't been fixed yet, and some special characters which work fine in the iOS Notes app don't work at all, but on the whole it's a rather good start. The control commands implemented for memoQ work far better than I expected at this stage. I've got a very boring, clumsy (and unlisted) video of my initial function tests here if anyone cares to look.
Before long, I'll release a few command cheat sheets I've compiled for English (update: it's HERE), German and Portuguese, which show which iOS dictation functions are implemented so far in Hey memoQ and which don't perform as expected. There are no comprehensive lists of these commands, and even the ones that claim to cover everything have gaps and errors, which one can only sort out by trial and error. This isn't an issue with the memoQ development team for the most part, but rather of Apple's chaotic documentation.
The initial release only has a full set of commands implemented in English. Those who want to use control commands for navigating, selecting, inserting, etc. will have to enter there own localized commands for now, and this too involves some trial and error to come up with a good working set. And I hope that before long the development team will implement the language-specific command sets as a shareable light resources. That will make it much easier to get all the available languages sorted out properly for productive work.
I am very happy with what I see at the start. Here are a few highlights of the current state of Hey memoQ dictation:
- Bilingual dictation, with source language dictation active when the cursor is on the source side and target language dictation active when the cursor is on the target side. Switching languages in my usual dictation tool - Dragon NaturallySpeaking - is a total pain in the butt.
- No trainable vocabulary at present (an iOS API limitation), but this is balanced in a useful way by commands like "insert first" through "insert ninth", which enable direct insertion of the first nine items in the Translation Results pane. Thus is you maintain good termbases, the "no train" pain is minimized. And you can always work in "mixed mode" as I usually do, typing what is not convenient to speak and using keyboard shortcuts for commands not yet supported by voice control, like tag insertion.
- Microphones connected (physically or via Bluetooth) with the iPhone or iPad work well if you don't want to use the integrated microphone in the iOS device. My Apple earphones worked great in a brief test.
Some users are a bit miffed that they can't work directly with microphones connected to the computer or with Android devices, but at the present time, the iOS dictation API is the best option for the development team to explore integrated speech functions which include program control. That won't work with Chrome speech recognition, for example. As other APIs improve, we can probably expect some new options for memoQ dictation.
Moreover, with the release of iOS 12, I think many older devices (which are cheap on eBay or probably free from friends who don't use them) are now viable tools for Hey memoQ dictation. Update: I found a list of iPhone and iPad devices compatible with iOS 12 here.)
Just for fun, I tested whether Hey memoQ and Dragon NaturallySpeaking interfere with one another. They don't it seems. I switched back and forth from one to the other with no trouble. During the app's beta phase, I did not expect that I would take Hey memoQ as a serious alternative to DNS for English dictation, but with the current set of commands implemented, I can already work with greater comfort than expected, and I may in fact use this free tool quite a bit. And I think my friends working into Portuguese, Russian and other languages not supported by DNS will find Hey memoQ a better option than other dictation solutions I've seen so far.
This is just the beginning. But it's a damned good start really, and I expect very good things ahead from memoQ's development team. And I'm sure that, once again, SDL and others will follow the leader :-)
And last, but not least, here's an update to show how to connect the Hey memoQ app on your iOS device to memoQ 8.7+ on your computer to get started with dictation in translation:
And last, but not least, here's an update to show how to connect the Hey memoQ app on your iOS device to memoQ 8.7+ on your computer to get started with dictation in translation:
Nov 9, 2018
Chrome speech recognition in all your Windows and Linux applications
In a recent social media discussion, a Slovenian colleague was asking me about the upcoming hey memoQ feature that I've been testing, and I found that iOS apparently doesn't support that language (nor does MacOS for that matter). But then she commented
Well, it seems that someone has addressed this deficiency.
The voice to text notebook extension of Chrome has additional tools available on the creator's website which enable the speech recognition functions to be used in any other application. This additional functionality is a service with fees, but at USD 2.50 per month or USD 16.00 per year (via PayPal), it's unlikely to break the bank. And a free trial of two days can be activated once you have registered. I'm testing it now, and it's rather interesting. Not perfect (as noted by the colleague who made me aware of this tool), but it may be an option for those wanting to use speech recognition in languages not currently supported by other applications.
I use Chrome's voice notebook plugin with memoQ. It works somehow for a while, then it gets laggy and I have to refresh Chrome or restart it. I miss the consistency and learning ability of DNS. But yes, the paid version allows you to use it with any app, including memoQ. The free version does not have this functionality. I love translating with dictation, I am not a fast typist and I rather hate typing...I had no idea what she was talking about, but a few more questions and a little investigation cleared up my confusion. Some years ago when Chrome's speech recognition feature was introduced, it seemed to me that it should be possible to adapt it for use in other (non-browser) applications, and I think this was even stated as a possibility. But at the time I could not find any application to do this, and I'm too out of practice these days to program my own.
Well, it seems that someone has addressed this deficiency.
The voice to text notebook extension of Chrome has additional tools available on the creator's website which enable the speech recognition functions to be used in any other application. This additional functionality is a service with fees, but at USD 2.50 per month or USD 16.00 per year (via PayPal), it's unlikely to break the bank. And a free trial of two days can be activated once you have registered. I'm testing it now, and it's rather interesting. Not perfect (as noted by the colleague who made me aware of this tool), but it may be an option for those wanting to use speech recognition in languages not currently supported by other applications.
Jun 3, 2018
Survey for Translation Transcription and Dictation
The website with the survey and short explainer video is http://www.sightcat.net
The idea is to build a human transcription service. We just need a few translators per language that want to work with a transcriptionist due to RSI, productivity etc. and we can use that data to build an ASR system for that language. There is also a good chance the ASR system will be accurate for domain-specific terminology and accents as it will be adaptive and use source language context.
![]() |
Click on the graphic to go to the survey |
John and I have been talking, brainstorming and arguing about many aspects of translation technology for years now, dictation (voice recognition, ASR, whatever you want to call it) foremost among the topics. So I was very pleased to see him at the conference in Budapest last week, where he spoke about logging as a research tool in the program and a lot about speech recognition before and after in the breaks, bars, coffee houses and social event venues.
I think that one of the most memorable things about memoQ Fest 2018 was the introduction of the dictation tool currently called hey memoQ, which covers a lot of what John and I have discussed until the wee hours over the past four years or so and which also makes what I believe will be the first commercial use of source text guidance for target text dictation (not to mention switching to source text dictation when editing source texts!). John introduced that to me years ago based on some research that he follows. Fascinating stuff.
One of the things he has been interested in for a while for commercial, academic and ergonomic reasons is support for minor languages. Understandable for a guy who speaks Gaelic (I think) and has quite a lot of Gaelic resources which might contribute to a dictation solution some day. So while I'm excited about the coming memoQ release which will facilitate dictation in a CAT tool in 40 languages (more or less, probably a lot more in the future), John is thinking about smaller, underserved or unserved languages and those who rely on them in their working lives.
That's what his survey is about, and I hope you'll take the time to give him a piece of your mind... uh, share your thoughts I mean :-)
The Great Dictator in Translation.
I have no need for words. memoQ will have that covered in quite a few languages.
This is not your grandfather's memoQ!
This is not your grandfather's memoQ!
May 21, 2018
Best Practices in Translation Technology: summer course in Lisbon July 16-21
As usual each year, the summer school at Universidade Nova de Lisboa is offering quite a variety of inexpensive, excellent intensive courses, including some for the practice of translation. This year includes a reprise of last year's Best Practices in Translation Technology from July 16th to 21st, with some different topics and approaches.
The course will be taught by the same team as last year – yours truly, Marco Neves and David Hardisty – and cover the following areas:
Some knowledge of the memoQ translation environment and translation experience are required.
The course is offered in the evening from 6 pm to 10 pm Monday (July 16th) through Friday (July 20th), with a Saturday (July 21st) session for review and exams from 9 am to 2 pm. This allows free days to explore Lisbon and the surrounding region and get to know Portugal and its culture.
Tuition costs for the general public are €130 for the 25 hours of instruction. The university certainly can't be accused of price-gouging :-) Summer course registration instructions are here (currently available only in Portuguese; I'm not sure if/when an English version will be available, but the instructors can be contacted for assistance if necessary).
Two other courses offered this summer at Uni Nova with similar schedules and cost are: Introduction to memoQ (taught by David and Marco – a good place to get a solid grounding in memoQ prior to the Best Practices course) from July 9–14, 2018 and Translation Project Management Tools from September 3–8, 2018.
All courses are taught in English and Portuguese in a mix suitable for the participants in the individual courses.
The course will be taught by the same team as last year – yours truly, Marco Neves and David Hardisty – and cover the following areas:
- Good translation workflows.
- Using voice recognition in translation.
- Using machine translation in a humane, intelligent way.
- Using checklists to improve communication in translation.
- Using glossaries, bilingual texts and other references in multiplatform environments.
- Good practices for using terminology and reference texts in the target language.
- Planning and creating lists for auto-translation rules and the basics of regular expressions for filters.
Some knowledge of the memoQ translation environment and translation experience are required.
The course is offered in the evening from 6 pm to 10 pm Monday (July 16th) through Friday (July 20th), with a Saturday (July 21st) session for review and exams from 9 am to 2 pm. This allows free days to explore Lisbon and the surrounding region and get to know Portugal and its culture.
Tuition costs for the general public are €130 for the 25 hours of instruction. The university certainly can't be accused of price-gouging :-) Summer course registration instructions are here (currently available only in Portuguese; I'm not sure if/when an English version will be available, but the instructors can be contacted for assistance if necessary).
Two other courses offered this summer at Uni Nova with similar schedules and cost are: Introduction to memoQ (taught by David and Marco – a good place to get a solid grounding in memoQ prior to the Best Practices course) from July 9–14, 2018 and Translation Project Management Tools from September 3–8, 2018.
All courses are taught in English and Portuguese in a mix suitable for the participants in the individual courses.
Jun 24, 2017
Germany needs Porsches! And Microsoft has the Final Solution....
So he was left with no choice but to cut overhead using the latest technologies. Microsoft to the rescue! With Microsoft Dictate, his crew of intern sausage technologists now speak customer texts into high-quality microphones attached to their Windows 10 service stations, and these are translated instantly into sixty target languages. As part of the company's ISO 9001-certified process, the translated texts are then sent for review to experts who actually speak and perhaps even read the respective languages before the final, perfected result is returned to the customer. This Linguistic Inspection and Accurate Revision process is what distinguishes the value delivered by Globelinguatrans GmbHaha from the TEPid offerings of freelance "translators" who won't get with the program.
But his true process engineering genius is revealed in Stage Two: the Final Acquisition and Revision Technology Solution. There the fallible human element has been eliminated for tighter quality control: texts are extracted automatically from the attached documents in client e-mails or transferred by wireless network from the Automated Scanning Service department, where they are then read aloud by the latest text-to-speech solutions, captured by microphone and then rendered in the desired target language. Where customers require multiple languages, a circle of microphones is placed around the speaker, with each microphone attached to an independent, dedicated processing computer for the target language. Eliminating the error-prone human speakers prevents contamination of the text by ums, ahs and unedited interruptions by mobile phone calls from friends and lovers, so the downstream review processes are no longer needed and the text can be transferred electronically to the payment portal, with customer notification ensuing automatically via data extracted from the original e-mail.
Major buyers at leading corporations have expressed excitement over this innovative, 24/7 solution for globalized business and its potential for cost savings and quality improvements, and there are predictions that further applications of the Goldberg Principle will continue to disrupt and advance critical communications processes worldwide.
Articles have appeared in The Guardian, The Huffington Post, The Wall Street Journal, Forbes and other media extolling the potential and benefits of the LIAR process and FARTS. And the best part? With all that free publicity, my friend no longer needs his sales staff, so they are being laid off and he has upgraded his purchase plans to a Maserati.
Oct 21, 2016
A day in the life....
One of the things I enjoy most about professional translation is the range of activities and subject matters that one can encounter, even as a specialist in a few domains. I can't say the work is never boring, but when it does drift that way, very suddenly it isn't any more. Quite unpredictably.
Yesterday I typed translations. A bit more than expected after two sets of PowerPoint slides - a small one to translate from German and another to edit the rather acceptable English - turned out to have about 8,000 words of highly specialized slide notes about military command and control structures and the technology of fighting forest fires. (Note to self: no matter how busy you are, always import those presentations into memoQ with the options set to extract every kind of text as well as the bitmap graphics if you have to translate those too. Then do a word count! Appearances can be deceiving.)
Yesterday I dictated translations. The job started out as a bunch of text fragments from slides, where context über alles was the rule, lots of terminology required research, and voice recognition offered no particular advantages, then suddenly it became the translation of a rather long lecture using all that new terminology, and the deadline was tighter than thumbscrews operated by an angry ex-girlfriend. Dragon NaturallySpeaking to the rescue. Not only was this necessary to finish the text in a long workday rather than most of a week, but the more natural style of translation by dictation suited the purpose of the translated presentation particularly well. I could imagine myself in the room with equipment vendors, military commanders, firefighting specialists and freight forwarders, talking about the challenges faced and the technology required to avoid the tragedies of an out-of-control firestorm. And the words came out, transcribed from my voice directly into the target text fields of memoQ, exactly as they should be spoken to that audience. And at the end of that long day my hands still had feeling in them, which would not have been the case if I had typed even a third of the text.
Yesterday I made a specialized glossary to share with a presenter who will travel halfway around the world to lecture with the slides I translated for his talk. Long ago I discovered that the way I produce translations has the potential to provide additional benefits for those who will use my work. Sales representatives might need to write letters to their prospects, discussing their products in a language not mastered as a native, and the vocabulary from my work may help them to improve communication and avoid confusion that might result from using incorrect or simply different words to describe the same stuff. Or an attorney might need a quick overview of the language I used to translate the pleading she intends to file, to ensure that it is consistent with previous efforts and will not complicate discussions with her client. The terminology I research and record for each translation can be exported and reformatted quickly to produce glossaries or more complex dictionaries in a variety of formats suited for purpose. Little time and often a lot of benefits for my clients.
Yesterday I translated bitmap graphics and not only had to deal with the editing tools for that but also had to consider the best strategy for transforming the original German graphics into English ones. Would those charts be translated again into other languages? Would the graphics be re-used in other types of documents, so that I should consider ease of portability in my approach to the translation? And how the Hell do I actually use that new bitmap graphics transcription and substitution for Microsoft Office files which was added to memoQ some time ago and sort out the five charts to translate from the fifty to ignore? (Maybe I should blog the solutions some day.)
And yesterday I was asked to write summaries of large, badly scanned articles so that the equipment manufacturer would understand how its latest technology was discussed by German reviewers. As a kid I had a silly fantasy about getting paid to read, and this is just one of the many ways it unexpectedly came true. But before I get that far, these scanned files needed to be reworked so that they could be read and searched on the screen, so as I described in a guest post on another blog some years ago, I converted them to searchable PDF/A with ABBYY FineReader, which in this case also reduced their size by about 75%. The video below also shows how this works. Strangely, when I describe this procedure to other translators, many of them don't get it, and they go on about converting PDF files into editable MS Word files or plain text, or, God help them, something really stupid like importing PDF files directly into a CAT tool for translation, though none of this really relates to my purpose. Conversions often contain errors, and many texts are harder to interpret when the context of an accurate layout is lost. So "text-on-image" PDF files for translation reference to the original source files are often critical, and for files to summarize or consult sporadically for reference (with many pages to look at and essentially nothing to translate), a searchable PDF is the gold standard for efficient work.
In the course of that day I had to work with two computers linked by remote access using four networks at various time, working in German, English and Portuguese (the latter mostly involving questions to the housekeeper on how to do an online pizza delivery order so I could stay in the office and keep working). I used well over a dozen software applications for necessary tasks. These, and the environments in which they operate must be balanced carefully for efficient work. And even after some months in my new office, the balance isn't quite as good as I've had it before, and more attention to ergonomics is required.
Some colleagues are nostalgic for the "good old days" when they received a stack of paper to translate and sent off another stack of paper when the work was done, and they had a filing cabinet or a shelf of notebooks full of old work to use as reference material, and boxes of index cards stuffed full of scribbled notes on terminology next to seldom-dusty specialist dictionaries prepared by presumed experts, often full of marginalia commenting on errors or omissions and stuffed with papers bearing other scribbled notes. Not me. Since the day 30 years ago when I laboriously typed a text file full of file folder numbers and content descriptions for my research work and personal papers I have been a big believer in electronic retrieval of information wherever possible, and I miss retyping botched pages just as little as I miss the lines in the post office or the stress of dealing with delivery services.
I suspect that some feel a loss of control with the advent of new technologies in an old profession, and certainly the changes in the business environment for translation since the days of the typewriter often require a very different mentality to survive and thrive. What that mentality is, exactly, is a matter of healthy debate and often misunderstanding - again, because of the great diversity of the profession and the professions and unprofessionals in it.
The greatest challenges of new technologies that I find are the same as those faced in many other kinds of work and in modern life in general. Filtering the overabundance of input for the few things that are truly of use or interest and maintaining focus and calm amidst omnipresent distractions. Not relying too much on technologies that are far more fallible than most people, even experts, realize or acknowledge. And remembering that a fool with a tool, however many features and failsafes it may offer, remains a fool.
Subscribe to:
Posts (Atom)