This blog post was produced by voice dictation on my iPhone 4S in a crowded restaurant with
a lot of background noise. This evening I came to my favorite hangout to work,
to get away from home for a while after a very long and stressful day.
I forgot my glasses when I left home, so I
cannot see the screen of my computer well enough to type accurately.
Essentially, I am working as if I were blind. I thought of driving home to
fetch my glasses and then returning here to work, but I did not want to take the
time. So, I thought that this would be the ideal opportunity
to test the dictation workflow which I have been showing to so many people in
quite a few languages in the last few weeks. Of course I am doing this in my
native language (English), but this would work just as well if I were a native
speaker of Arabic or Romanian or Portuguese, for example. What I am
experiencing so far in this test is that after speaking for a certain amount of
time, during which a text chunk of a certain size has been
generated, the application stops and communicates with the transcription server
from Nuance online, producing the transcribed text in the language which I am
speaking. However, that does not pose a great difficulty; I can simply restart
the recording and the text continues. If I want to, I can make corrections with
an on-screen keyboard on my mobile phone, but I prefer to email the text after
I am finished and make any changes or corrections on my computer. The last few
weeks have been very interesting. At the JABA Partner Summit in Porto,
Portugal, and later at the GALA conference in Seville, Spain, I tested this
workflow together with native speakers of many languages not supported by
Dragon NaturallySpeaking from Nuance. In every case the results seemed to be
excellent, but the texts generated during the tests were usually rather short,
no more than one or two paragraphs.
This is the longest text that I have created by this
process so far. I find that the "chunking" behavior of the
application is actually helpful. It allows me to look at groups of text that
are not too large (about enough to fill the screen of the iPhone) and make important corrections manually before I continue. On
the whole, this is in fact a rather comfortable process. With it, I can hang
out in the barn with my goats and chickens and a printout and translate
comfortably with a beer in one hand. Not bad. The ergonomic aspects are
excellent. I am dictating this text in English with a great deal of noise
coming from the nearby kitchen and the television which is less than 3 m from
me, blaring loudly in Portuguese.
I am very satisfied with the results of tonight's test.
And I hope that others will explore this workflow further, creating new
possibilities for better, more profitable work in many languages using this new
speech recognition capability. I think this is a game-changer.
This works on any Apple mobile device, such as the
iPhone, iPad or iPod. The app to download from the App Store is called
"Dragon Dictation". It is free. I discovered this particular
possibility after reading time and again the quality of speech recognition on
mobile devices is actually superior to what is available on desktop computers,
because that is where all of the research time and money is currently being
invested. It took me a while to realize the implications of this, but now I see that many can benefit a great deal from the possibilities that this makes
available. I look forward to reports of work in other languages. (The only
language that I have discovered to have significant restrictions so far is
Japanese, where apparently the Kanji recognition is not very good and Hiragana
characters are used too often, making a text difficult to read for a native
speaker. Steve Vitek tells me that the problem is that there are too many
homophones in Japanese, but that this should work well in another language such
as Mandarin Chinese. The initial tests with Mandarin Chinese in Seville, Spain
actually looked rather good.)
After dictation and transcription are complete, a few
button presses can send the text to an e-mail server for manual or automated
processing.