Pages

Apr 3, 2015

Free, good quality speech recognition for Portuguese, Arabic and more in your pocket!


This blog post was produced by voice dictation on my iPhone 4S in a crowded restaurant with a lot of background noise. This evening I came to my favorite hangout to work, to get away from home for a while after a very long and stressful day.

I forgot my glasses when I left home, so I cannot see the screen of my computer well enough to type accurately. Essentially, I am working as if I were blind. I thought of driving home to fetch my glasses and then returning here to work, but I did not want to take the time. So, I thought that this would be the ideal opportunity to test the dictation workflow which I have been showing to so many people in quite a few languages in the last few weeks. Of course I am doing this in my native language (English), but this would work just as well if I were a native speaker of Arabic or Romanian or Portuguese, for example. What I am experiencing so far in this test is that after speaking for a certain amount of time, during which a text chunk of a certain size has been generated, the application stops and communicates with the transcription server from Nuance online, producing the transcribed text in the language which I am speaking. However, that does not pose a great difficulty; I can simply restart the recording and the text continues. If I want to, I can make corrections with an on-screen keyboard on my mobile phone, but I prefer to email the text after I am finished and make any changes or corrections on my computer. The last few weeks have been very interesting. At the JABA Partner Summit in Porto, Portugal, and later at the GALA conference in Seville, Spain, I tested this workflow together with native speakers of many languages not supported by Dragon NaturallySpeaking from Nuance. In every case the results seemed to be excellent, but the texts generated during the tests were usually rather short, no more than one or two paragraphs.

This is the longest text that I have created by this process so far. I find that the "chunking" behavior of the application is actually helpful. It allows me to look at groups of text that are not too large (about enough to fill the screen of the iPhone) and make important corrections manually before I continue. On the whole, this is in fact a rather comfortable process. With it, I can hang out in the barn with my goats and chickens and a printout and translate comfortably with a beer in one hand. Not bad. The ergonomic aspects are excellent. I am dictating this text in English with a great deal of noise coming from the nearby kitchen and the television which is less than 3 m from me, blaring loudly in Portuguese.

I am very satisfied with the results of tonight's test. And I hope that others will explore this workflow further, creating new possibilities for better, more profitable work in many languages using this new speech recognition capability. I think this is a game-changer.

This works on any Apple mobile device, such as the iPhone, iPad or iPod. The app to download from the App Store is called "Dragon Dictation". It is free. I discovered this particular possibility after reading time and again the quality of speech recognition on mobile devices is actually superior to what is available on desktop computers, because that is where all of the research time and money is currently being invested. It took me a while to realize the implications of this, but now I see that many can benefit a great deal from the possibilities that this makes available. I look forward to reports of work in other languages. (The only language that I have discovered to have significant restrictions so far is Japanese, where apparently the Kanji recognition is not very good and Hiragana characters are used too often, making a text difficult to read for a native speaker. Steve Vitek tells me that the problem is that there are too many homophones in Japanese, but that this should work well in another language such as Mandarin Chinese. The initial tests with Mandarin Chinese in Seville, Spain actually looked rather good.)

After dictation and transcription are complete, a few button presses can send the text to an e-mail server for manual or automated processing.



Sent from my iPhone



Recorded at Cantinho Da Ti Bilete in Évora, Portugal. Photos by César Almeida.


2 comments:

  1. Hi Kevin,

    Just read through your various posts about Dragon. Interesting stuff. Very interesting indeed.

    I recently discovered something to make it even more useful: Vocola, Unimacro, Dragonfly, etc. That is, open source software that allows to me control my computer my voice, in addition to being able to dictate speech.

    For example, I can say "browser", and Vocola will switch to Yandex.Browser (my favorite browser at the moment). All kinds of stuff in my browser works, such as "close tab", "scroll down/up 1", "scroll downup 2", "back", "forward", "search for", etc.

    If I say "CafeTran", I'm taken to CafeTran. In CafeTran, I can do all kinds of stuff, like: "confirm", apply various filters, search my TBs/TMs, translate selection, etc. I'm currently working on commands for text editing (moving stuff around, selecting words, deleting things, etc.). All commands for CafeTran go in a special txt file: https://dl.dropboxusercontent.com/u/6802597/Documents/java.vcl

    All global commands are written in another text file. Here's what I have so far, after playing with it a few days: https://dl.dropboxusercontent.com/u/6802597/Documents/java.vcl

    It is very easy to write basic Vocola commands. However, it gets better: the developer has also managed to integrate AutoHotkey. You can either call an AHK script with Vocola commands/code (your action is coded in AHK – and in a .ahk file on your computer – but called from a Vocola command), or ... you can stick AHK code right in your Vocola command (!). This is great for one-liners.

    I actually also recently discovered that KnowBrainer (the guys who resell Dragon stuff, mikes, etc.) have their own, commercial, version of something similar: KnowBrainer 2014/2015, which I haven't tried yet. Actually, I can't get their installer to work.

    I have only really scratched the surface and have only tried Vocola/Unimacro so far. Dragonfly (which is aimed at "voice coding") seems to be the next level up.

    Some useful links to get started:

    install natlink unimacro vocola
    https://www.youtube.com/watch?v=iViDXfyYPLo

    enable vocola and do global commands
    https://www.youtube.com/watch?v=jf5NV5fh420

    start unimacro and use grammar control
    https://www.youtube.com/watch?v=Q5cg7OGN0Ow

    Vocola:
    http://vocola.net/

    Unimacro:
    http://qh.antenna.nl/unimacro/index.html

    Unofficial Vocola 2 extensions
    http://vocola.net/unofficial/extensions.html

    ReplyDelete
  2. Small update: I've since gotten KnowBrainer to work on my computer. In fact, I actually bought KnowBrainer 2015, Dragon NaturallySpeaking Professional 13 and a SpeechWare 3-in-1 TableMike. Having lots of speech recognition fun here in the Beijer household!

    ReplyDelete

Notice to spammers: your locations are being traced and fed to the recreational target list for my new line of chemical weapon drones :-)