Dec 11, 2018

Your language in Hey memoQ: recognition information for speech

There are quite a number of issues facing memoQ users who wish to make use of the new speech recognition feature – Hey memoQ – released recently with memoQ version 8.7. Some of these are of a temporary nature (workarounds and efforts to deal with bugs or shortcomings in the current release which can reasonably be expected to change soon), others – like basic information on commands for iOS dictation and what options have been implemented for your language – might not be so easy to work out. My own research in this area for English, German and Portuguese has revealed a lot of errors in some of the information sources, so often I have to take what I find and try it out in chat dictation, e-mail messages or the Notes app (my favorite record-keeping tool for such things) on the iOS device. This is the "baseline" for evaluating how Hey memoQ should transcribe text in a given language.

But where do you find this information? One of the best way might be a Google Advanced Search on Apple's support site. Like this one, for example:


The same search (or another) can be made by adding the site specification after your search terms in an ordinary Google search:


The results lists from these searches reveal quite a number of relevant articles about iOS dictation in English. And by hacking the URLs on certain pages and substituting the language code desired, one can get to the information page on commands available for that language. Examples include:
All the same page, with slightly modified URLs.

The Mac OS information pages are also a source of information on possible iOS commands that one might not find so easily otherwise. An English page with a lot of information on punctaution and symbols is here: https://support.apple.com/en-us/HT202584

The same information (if available) for other languages is found just by tweaking the URL:

and so on. Some guidance on Apple's choice of codes for language variants is here, but I often end up getting to where I want to go by guesswork. The Microsoft Azure page for speech API support might be more helpful to figure out how to tweak the Apple Support URLs.

When you edit the commands list, you should be aware of a few things to avoid errors.
  • The current command lists in the first release may contain errors, such as mistakenly typing "phrase" in angular brackets as shown in the first example above; on editing, the commands that are followed by a phrase do not show the placeholder for that phrase, as you see in the example marked "2".
  • Commands must be entered without quotation marks! Compare the marked examples 1 and 2 above. If quotes are typed when editing a command, this will not be revealed by the appearance of the command; it will look OK but won't work at all until the quote marks are removed by editing.
  • Command creation is an iterative process that may entail a lot of frustrating failures. When I created my German command set, I started by copying some commands used for editing by Dragon NaturallySpeaking, but often the results were better if I chose other words. Sometimes iOS stubbornly insists on transcribing some other common expression, sometimes it just insists on interpreting your command as a word to transcribe. Just be patient and try something else.
The difficulties involved in command development at this stage are surely why only one finished command set (for the English variants) for memoQ-specific commands was released at first. But that makes it all the more important to make command sets "light resources" in memoQ, which can be easily exported and exchanged with others.

At the present stage, I see the need for developing and/or fixing the Hey memoQ app in the following ways:
  • Fix obvious bugs, which include: 
  • The apparently non-functional concordance insertions. In general, more voice control would be helpful in the memoQ Concordance.
  • Capitalization errors which may affect a variety of commands, like Roman numerals, ALL CAPS, title capitalization (if the first word of the title is not at the start of the segment), etc.
  • Dodgy responses to the commands to insert spaces, where it is often necessary to say the command twice and get stuck with two spaces, because a single command never responds properly by inserting a space. Why is that needed? Well, otherwise you have to type a space on the keyboard if you are going to use a Translation Results insertion command to insert specialized terminology, auto-translation rule results, etc. into your text. 
  • Address some potentially complicated issues, like considering what to do about source language text handling if there is no iOS support for the source language or the translator cannot dictate commands effectively in that language. I can manage in German or Portuguese, but I would be really screwed these days if I had to give commands in Russian or Japanese.
  • Expand dictation functionality in environments like the QA resolution lists, term entry dialog, alignment editor and other editors.
  • Look for simple ideas that could maximize returns for programming effort invested, like the "Press" command in Dragon NaturallySpeaking, which enables me to insert tags, for example, by saying "Press F9". This would eliminate the need for some commands (like confirmation and all the Translation Results insertion commands) and open up a host of possibilities by making keyboard shortcuts in any context controllable by voice. I've been thinking a lot about that since talking to a colleague with some pretty tough physical disabilities recently.
Overall, I think that Hey memoQ represents a great start in making speech recognition available in a useful way in a desktop translation environment tool and making the case for more extensive investments in speech recognition technology to improve accessibility and ergonomics for working translators.

Of course, speech recognition brings with it a number of different challenges for reviewing work: mistakes (or "dictos" as they are sometimes called, a riff on keyboard "typos") are often harder to catch, especially if one is reviewing directly after translating and the memory of intended text is perhaps fresh enough to override in perception what the eye actually sees. So maybe before long we'll see an integrated read-back feature in memoQ, which could also benefit people who don't work with speech recognition. 

Since I began using speech recognition a lot for my work (to cope with occasionally unbearable pain from gout), I have had to adopt the habit of reading everything out loud after I translate, because I have found this to be the best way to catch my errors or to recognize where the text could use a rhetorical makeover. (The read-back function of Dragon NaturallySpeaking in English is a nightmare, randomly confusing definite and indefinite articles, but other tools might be usable now for external review and should probably be applied to target columns in an exported RTF bilingual file to facilitate re-import of corrections to the memoQ environment, though the monolingual review feature for importing edited target text files and keeping project resources up-to-date is also a good option.)

As I have worked with the first release of Hey memoQ, I have noticed quite a few little details where small refinements or extensions to the app could help my workflow. And the same will be true, I am sure, with most others who use this tool. It is particularly important at this stage that those of us who are using and/or testing this early version communicate with the development team (in the form of e-mail to memoQ Support - support@memoq.com - with suggestions or observations). This will be the fastest way to see improvements I think.

In the future, I would be surprised if applications like this did not develop to cover other input methods (besides an iOS device like an iPhone or iPad). But I think it's important to focus on taking this initial platform as far as it can go so that we can all see the working functionality that is missing, so that as the APIs for relevant operating systems develop further to support speech recognition (especially the Holy Grail for many of us, trainable vocabulary like we have in Dragon NaturallySpeaking and a very few other applications). Some of what we are looking for may be in the Nuance software development kits (SDKs) for speech recognition, which I suggested using some years ago because they offer customizable vocabularies at higher levels of licensing, but this would represent a much greater and more speculative investment in an area of technology that is still subject to a lot of misunderstanding and misrepresentation.

8 comments:

  1. Hi Kevin,

    "Press " commands are tricky, because you dictate in the target language of the project (most of the time at least). I don't think that "Enter", "Esc", "Tab", "Space", etc. would work fine. For example, Hungarians typically refer to these keys by their English names, typically with a good "Hunglish" accent.

    Select : all you need to know is that you only need to type your desired version of "Select". If we allowed the user to edit the thing, that would be an additional potential source of errors. The thingy is always place automatically by memoQ at the end. This also means that you currrently can't put any words after the . This were all design decisions, which are of course not always perfect. :)

    The Concordance window was honestly "let go" for the initial release. We had limited time and resources and focused on the functionality in the translation grid itself. I still feel that refining the core task of dictating the translation is way more important. Same with read-back, and adding doictation "everywhere else": I still think that we can bring some good improvements in the basics of just dictating translations, and that is more worth it.

    Spaces: if I understand this correctly, my feeling is that spaces should be placed by the insertion command in a right way in the first place, instead of helping the user work around it.

    BR,
    Gergely

    ReplyDelete
    Replies
    1. Thanks, Gergely. I was afraid that maybe I was doing something wrong with that concordance command.

      I agree about the spaces. I tried to explain to Zsolt the other day that this is more than a dictation issue with insertions and (prepended spaces) ought to be an option for certain types of translation results hits. But still, the response to a dictated "spacebar" command needs fixing!

      Delete
    2. Ah, I see that what I wrote about the "Select [phrase]" thing is all messed up above, and totally incomprehensible at the moment, because the occurences of [phrase] all vanished.

      So the point was that when you edit the "Select [phrase]" voice command, you can only change the "select" part of it to some other word(s). It is intentional that you do not see [phrase], and you cannot put that part anywhere else but the end of the command.

      Delete
  2. I've been a paying active user of MemoQ since 2010, and I don't have any Apple products. How ironic that MemoQ doesn't run natively on Macs, but insists on using an iPhone for their Hey MemoQ feature. Why no Android version? I feel left out.

    ReplyDelete
  3. Basically we had two free speech-to-text options, the one in iOS and the one in Android. (There might be paid ones that are better, but the cost makes them unrealistic. The translator would in the end spend as mush or more on "dictation bills" as on memoQ itself. Also, frankly, the providers of these services weren't interested in talking with us back in the day.) Developing for one platform at a time makes a lot of sense for a new product. For example, if we go into some wrong directions, or have some expensive bugs, we would need to spend on fixing two platforms if we had two. Also, a new product like this in not a guaranteed success. If we spend twice the amount on two platforms, and it turns out that it isn't a success, that is twice the loss.

    Also, as we said in our Hey memoQ FAQ, the speech recognition in iOS (iPhone + iPad) is more feature complete for our purposes. In Android, for many languages, if you say "comma" or "period" (in your language), it won't result in the punctuation sign in the text, but the words themselves. It would need some additional language specific work.

    Also, call me arrogant, but if you think that dictation could help your productivity, the minimal iOS device for Hey memoQ is an iPhone 5, which costs about 50 euros used. It might pay for itself the first day. I can assure you that an Android version of Hey memoQ would cost us somewhat more. :) Honestly, if we were hell bent on increasing adoption, buying a lot os used iPhone 5s and sending it out to memoQ users with Android phones that are interested in dictation would be a more cost efficient approach. :P

    ReplyDelete
    Replies
    1. I have spent several times what a used iPhone 5S or 6 would cost for various microphones I use with Dragon NaturallySpeaking, my Blue Yeti being one of the cheapest at about €130 on sale at the time. So yes, you are right about recouping costs quickly.

      But as I've mentioned a few times, the real barriers to effective use/testing right now are more:
      (1) working out reliable microphone ergonomics,
      (2) functional command sets for the dictation languages (only English was released by mQtech, and I published German commands on this blog, but these are really of interest only for source language command dictation, because those are languages covered by Dragon NS - Hey memoQ's potential is really for non-DNS languages as we know),
      (3) practice/guidance for mobile "dictation style" (which differs, as I have found, from the requirements of desktop computer recognition apps), and
      (4) really good bandwidth!! (testing on dodgy mobile data links where Google Translate fails half the time will not give you good dictation results no matter how great your mike and its positioning is!)

      Within the reasonably limited parameters of iOS devices coupled to memoQ running on Windows machines (on MacOS devices with Parallels, etc. you can use MacOS dictation with all its options), the critical task now is to figure out the best recommendations for successful and consistent sound pickup (microphone) in a way that leaves hands free to use a keyboard and to configure commands for handling the controls in the respective target language. As I learned in my work for that in German, this last one is not trivial at all and clearly will have to be handled by knowledgeable, patient users in most cases, not by mQtech. But... it will be much easier to share the configuration results when these are exchangeable as light resources and we don't have to hack that XML configuration file as I described in another blog post.

      This feature is at the proof of concept stage and basically looks good in that respect, but without the issues 1-4 being sorted out, it will of course look rather bad in many/most tests. But then, when I follow discussions such as those in the Facebook forum for Translators Who Use Dictation (as I think it's called), a lot of people have very bad initial experience with better tools like Dragon NaturallySpeaking because they don't understand the microphone issues or good speaking practices for dictation or they don't look at the command cheat sheet to deal with issues like capitalization. In the initial release of Hey memoQ, as you may recall, there are serious, documented problems with a number of the capitalization-related iOS commands, and for my work, this would be a major irritant in any case until it is fixed. For now I focus on other things, like figuring out good microphone configurations, because those are important, really, no matter what dictation solution I use.

      Delete
    2. To necropost again,

      (1) isn't this issue with microphones somewhat overblown? I'm convinced you (and Jim Wardell etc) are trying to help, but I'm afraid you are just making it look more difficult than it is. A simple bluetooth headset should work. Or a quiet room.
      (2) We will be working on voice commands for other languages. It indeed might have been a mistake not to do this for the initial release. Probably not for all of them at the same time.
      (3) I'm not sure I understand the problem of "dictation style". Honestly, there could be factors I do not see, but the iOS speech-to-text thing seems to understand my "Hunglish" accent extremely well. With general English text, I can speak extremely fast and there are hardly any mistakes. But, again, there might be factors I'm missing. Other accents might be less "efficient", and I even heard that speech-to-text might be biased towards men (meaning it is more likely to understand them better. No kidding, I heard it in a talk by a speech-to-text expert at a conference: the point was that the tech worked better for W.A.S.P. men, than for women or any minority, sadly.
      (4) This is interesting, it might make sense for us to try and test with dodgy connections.

      Delete
    3. I'm glad that iOS is kind about "Hunglish"; I think it was István who always complained that DNS hated his dictation :-) Can't see why, really, as he has a nice accent.

      You're probably right about "just a bluetooth headset" as screaming with my bluetooth earbuds works reasonably well. The only real problem is that all my expensive microphone equipment is USB-based, because all my recording studio gurus have emphasized "better results" for audio recordings with such equipment for years, so that's where I spent my money, and I use the same stuff with DNS. But I have had my eye on a nice bit of bluetooth equipment that a Portuguese author I know got recently to dictate his new book with that Chrome app I blogged about recently. I'm told it works well for him, though I don't know if we share the same level of expectation.

      By "dictation style" I refer to the speed and number of words used. I see different results generally with iOS (any iOS dictation) at my office with nice fiber optic-based high bandwidth versus at the farm, where weather conditions can play Hell with my sometimes-4G, sometimes-3G-if-I'm-lucky connection. At least I've got the WASP male part going for me ;-)

      Don't beat yourself up too much about the command thing. I thought it would be a no-brainer to whip up a quick functional command set in German, and it took more time than I expected. I appreciate that the app made it out when it did after a long wait; we could at least test more, identify little issues like those capitalization bugs with the iOS caps commands and think about the implications of command use (and the need to switch the command language to filter on source text, stuff like that). I think when some key non-DNS language command sets like Russian or Romanian are out and these are easily exchangeable as light resources (instead of by playing games with the XML file like I do) things will be looking up a good bit. When/if that resource implement happens, maybe add an option to apply to all sub-languages. It was a pain dealing with the four variants of German I do separately, though this is really an academic/demo issue for me, because my working languages are the best-supported ones for DNS.

      Delete

Notice to spammers: your locations are being traced and fed to the recreational target list for my new line of chemical weapon drones :-)