Can you repeat that? Understanding speech recognitionPosted: May 23, 2013
To me, speech recognition is nothing short of a miracle. A piece of software can take the words I speak (in my nasally-southwestern Ohio twang, no less), work its magic, and translate that messy sound into relatively coherent text. If you’ve ever transcribed spoken conversation using a tape recorder and your own fortitude, you can appreciate the enormity of this accomplishment.
My first experience with speech recognition came about five years ago when I purchased a copy of Dragon Naturally Speaking. The product was marketed mostly for people with disabilities at that time, but I wanted to use it to record comments on student papers during a particularly grueling online summer course.
The idea was that you could skip the keyboard and “train” the computer to learn your voice. You could even teach it common acronyms or jargon.
Once you got the hang of it, the thing was pretty accurate, although it sure generated some amusing errors from time to time. (Here’s an entire blog devoted to that sort of thing.) So in the end, using it for student comments caused some issues, and didn’t really save me any time at all.
Fast forward to VoiceOver, Siri, Google Voice and all the rest. The technology is undeniably getting better, but there is still a missing piece. Much like a trained parrot, speech recognition software is really just a means of processing speech patterns and looking for distinct differences (“yes” versus “no,” for example.) Right now, natural language processing (understanding the semantics and syntax of those sounds) is still a ways off.
Finally, for an interesting exploration of the differences between speech recognition and natural language processing, check out Geoffrey Pullum’s excellent article: Speech Recognition vs. Language Processing – Lingua Franca – The Chronicle of Higher Education.