Translation Tribulations: Your language in Hey memoQ: recognition information for speech

Dec 11, 2018

Your language in Hey memoQ: recognition information for speech

There are quite a number of issues facing memoQ users who wish to make use of the new speech recognition feature – Hey memoQ – released recently with memoQ version 8.7. Some of these are of a temporary nature (workarounds and efforts to deal with bugs or shortcomings in the current release which can reasonably be expected to change soon), others – like basic information on commands for iOS dictation and what options have been implemented for your language – might not be so easy to work out. My own research in this area for English, German and Portuguese has revealed a lot of errors in some of the information sources, so often I have to take what I find and try it out in chat dictation, e-mail messages or the Notes app (my favorite record-keeping tool for such things) on the iOS device. This is the "baseline" for evaluating how Hey memoQ should transcribe text in a given language.

But where do you find this information? One of the best way might be a Google Advanced Search on Apple's support site. Like this one, for example:

The same search (or another) can be made by adding the site specification after your search terms in an ordinary Google search:

The results lists from these searches reveal quite a number of relevant articles about iOS dictation in English. And by hacking the URLs on certain pages and substituting the language code desired, one can get to the information page on commands available for that language. Examples include:

German (de-de)
Polish (pl-pl)
Russian (ru-ru)
Arabic (ar-ae)
Japanese (ja-jp)
Greek (el-gr)

All the same page, with slightly modified URLs.

The Mac OS information pages are also a source of information on possible iOS commands that one might not find so easily otherwise. An English page with a lot of information on punctaution and symbols is here: https://support.apple.com/en-us/HT202584

The same information (if available) for other languages is found just by tweaking the URL:

German (de-de)
Portuguese (pt-pt, there is also a pt-br page, but I haven't read both to check differences)
Polish (pl-pl)
Norwegian (that's a no-no)
Arabic (ar-ae)
Turkish (tr-tr)
French (fr-fr)
Thai (th-th, see also this commentary on another site)

and so on. Some guidance on Apple's choice of codes for language variants is here, but I often end up getting to where I want to go by guesswork. The Microsoft Azure page for speech API support might be more helpful to figure out how to tweak the Apple Support URLs.

When you edit the commands list, you should be aware of a few things to avoid errors.

The current command lists in the first release may contain errors, such as mistakenly typing "phrase" in angular brackets as shown in the first example above; on editing, the commands that are followed by a phrase do not show the placeholder for that phrase, as you see in the example marked "2".
Commands must be entered without quotation marks! Compare the marked examples 1 and 2 above. If quotes are typed when editing a command, this will not be revealed by the appearance of the command; it will look OK but won't work at all until the quote marks are removed by editing.
Command creation is an iterative process that may entail a lot of frustrating failures. When I created my German command set, I started by copying some commands used for editing by Dragon NaturallySpeaking, but often the results were better if I chose other words. Sometimes iOS stubbornly insists on transcribing some other common expression, sometimes it just insists on interpreting your command as a word to transcribe. Just be patient and try something else.

The difficulties involved in command development at this stage are surely why only one finished command set (for the English variants) for memoQ-specific commands was released at first. But that makes it all the more important to make command sets "light resources" in memoQ, which can be easily exported and exchanged with others.

At the present stage, I see the need for developing and/or fixing the Hey memoQ app in the following ways:

Fix obvious bugs, which include:

The apparently non-functional concordance insertions. In general, more voice control would be helpful in the memoQ Concordance.

Capitalization errors which may affect a variety of commands, like Roman numerals, ALL CAPS, title capitalization (if the first word of the title is not at the start of the segment), etc.

Dodgy responses to the commands to insert spaces, where it is often necessary to say the command twice and get stuck with two spaces, because a single command never responds properly by inserting a space. Why is that needed? Well, otherwise you have to type a space on the keyboard if you are going to use a Translation Results insertion command to insert specialized terminology, auto-translation rule results, etc. into your text.

Address some potentially complicated issues, like considering what to do about source language text handling if there is no iOS support for the source language or the translator cannot dictate commands effectively in that language. I can manage in German or Portuguese, but I would be really screwed these days if I had to give commands in Russian or Japanese.
Expand dictation functionality in environments like the QA resolution lists, term entry dialog, alignment editor and other editors.
Look for simple ideas that could maximize returns for programming effort invested, like the "Press" command in Dragon NaturallySpeaking, which enables me to insert tags, for example, by saying "Press F9". This would eliminate the need for some commands (like confirmation and all the Translation Results insertion commands) and open up a host of possibilities by making keyboard shortcuts in any context controllable by voice. I've been thinking a lot about that since talking to a colleague with some pretty tough physical disabilities recently.

Overall, I think that Hey memoQ represents a great start in making speech recognition available in a useful way in a desktop translation environment tool and making the case for more extensive investments in speech recognition technology to improve accessibility and ergonomics for working translators.

Of course, speech recognition brings with it a number of different challenges for reviewing work: mistakes (or "dictos" as they are sometimes called, a riff on keyboard "typos") are often harder to catch, especially if one is reviewing directly after translating and the memory of intended text is perhaps fresh enough to override in perception what the eye actually sees. So maybe before long we'll see an integrated read-back feature in memoQ, which could also benefit people who don't work with speech recognition.

Since I began using speech recognition a lot for my work (to cope with occasionally unbearable pain from gout), I have had to adopt the habit of reading everything out loud after I translate, because I have found this to be the best way to catch my errors or to recognize where the text could use a rhetorical makeover. (The read-back function of Dragon NaturallySpeaking in English is a nightmare, randomly confusing definite and indefinite articles, but other tools might be usable now for external review and should probably be applied to target columns in an exported RTF bilingual file to facilitate re-import of corrections to the memoQ environment, though the monolingual review feature for importing edited target text files and keeping project resources up-to-date is also a good option.)

As I have worked with the first release of Hey memoQ, I have noticed quite a few little details where small refinements or extensions to the app could help my workflow. And the same will be true, I am sure, with most others who use this tool. It is particularly important at this stage that those of us who are using and/or testing this early version communicate with the development team (in the form of e-mail to memoQ Support - support@memoq.com - with suggestions or observations). This will be the fastest way to see improvements I think.

In the future, I would be surprised if applications like this did not develop to cover other input methods (besides an iOS device like an iPhone or iPad). But I think it's important to focus on taking this initial platform as far as it can go so that we can all see the working functionality that is missing, so that as the APIs for relevant operating systems develop further to support speech recognition (especially the Holy Grail for many of us, trainable vocabulary like we have in Dragon NaturallySpeaking and a very few other applications). Some of what we are looking for may be in the Nuance software development kits (SDKs) for speech recognition, which I suggested using some years ago because they offer customizable vocabularies at higher levels of licensing, but this would represent a much greater and more speculative investment in an area of technology that is still subject to a lot of misunderstanding and misrepresentation.

8 comments:

gergelyvDecember 12, 2018 10:17 AM
Hi Kevin,

"Press " commands are tricky, because you dictate in the target language of the project (most of the time at least). I don't think that "Enter", "Esc", "Tab", "Space", etc. would work fine. For example, Hungarians typically refer to these keys by their English names, typically with a good "Hunglish" accent.

Select : all you need to know is that you only need to type your desired version of "Select". If we allowed the user to edit the thing, that would be an additional potential source of errors. The thingy is always place automatically by memoQ at the end. This also means that you currrently can't put any words after the . This were all design decisions, which are of course not always perfect. :)

The Concordance window was honestly "let go" for the initial release. We had limited time and resources and focused on the functionality in the translation grid itself. I still feel that refining the core task of dictating the translation is way more important. Same with read-back, and adding doictation "everywhere else": I still think that we can bring some good improvements in the basics of just dictating translations, and that is more worth it.

Spaces: if I understand this correctly, my feeling is that spaces should be placed by the insertion command in a right way in the first place, instead of helping the user work around it.

BR,
Gergely

ReplyDelete
Replies
ZingaroJanuary 08, 2019 5:32 PM
I've been a paying active user of MemoQ since 2010, and I don't have any Apple products. How ironic that MemoQ doesn't run natively on Macs, but insists on using an iPhone for their Hey MemoQ feature. Why no Android version? I feel left out.
ReplyDelete
Replies
gergelyvJanuary 10, 2019 11:42 AM
Basically we had two free speech-to-text options, the one in iOS and the one in Android. (There might be paid ones that are better, but the cost makes them unrealistic. The translator would in the end spend as mush or more on "dictation bills" as on memoQ itself. Also, frankly, the providers of these services weren't interested in talking with us back in the day.) Developing for one platform at a time makes a lot of sense for a new product. For example, if we go into some wrong directions, or have some expensive bugs, we would need to spend on fixing two platforms if we had two. Also, a new product like this in not a guaranteed success. If we spend twice the amount on two platforms, and it turns out that it isn't a success, that is twice the loss.

Also, as we said in our Hey memoQ FAQ, the speech recognition in iOS (iPhone + iPad) is more feature complete for our purposes. In Android, for many languages, if you say "comma" or "period" (in your language), it won't result in the punctuation sign in the text, but the words themselves. It would need some additional language specific work.

Also, call me arrogant, but if you think that dictation could help your productivity, the minimal iOS device for Hey memoQ is an iPhone 5, which costs about 50 euros used. It might pay for itself the first day. I can assure you that an Android version of Hey memoQ would cost us somewhat more. :) Honestly, if we were hell bent on increasing adoption, buying a lot os used iPhone 5s and sending it out to memoQ users with Android phones that are interested in dictation would be a more cost efficient approach. :P
ReplyDelete
Replies

Add comment

Notice to spammers: your locations are being traced and fed to the recreational target list for my new line of chemical weapon drones :-)

Search me!

Dec 11, 2018

Your language in Hey memoQ: recognition information for speech

8 comments: