Intro to Speech Computing

John GannonApril 6, 2009

This is the first in a series of articles about speech computing and how you can interact with your notebook by speaking to it.

Many people become easily confused by the various types and terms involved in computerized speech. They often mistakenly use terms in one context that are actually applicable to another situation. I want to briefly describe and hopefully offer greater clarity for three areas regarding computerized speech

The first area is text-to-speech, the second will be voice-recognition and finally I will talk briefly about speech recognition. All three of these functions do different things. All are extremely helpful, but not necessarily useful for everyone. Depending upon your individualized circumstances you may want to consider trying one of the three or all of the three.

The history of text-to-speech goes back to the late 1960s. Bell laboratories, at that time, were successful in creating synthesized voice. The synthesized voice so impressed Stanley Clarke, the author of 2001, that he used the premise of synthesized voice for Hal the computer. Over the passage of time, synthesized voices became more naturally sounding. Today’s voices sound fantastic! One could never tell they have been created with a computer.

Text-to-speech is a phenomenal tool for those who have impaired vision. They can now download any written material and listen as it is played back to them via speech. I have just recently read that the use of Braille has dropped dramatically. This is probably a result of text-to-speech and the ease of use that computers supply those with impaired vision.

I have occasionally used text-to-speech to listen to material I have dictated. Sometimes, I have found it is good to listen to what I have written verses read what I have written.

Voice-recognition is not speech recognition. Voice recognition works on the premise of who is speaking versus what is being said. It requires an enrollment where the program learns how you speak and then is able to compare what is being said with a best match model. Noise levels can directly impact the ability to pick the best match. That is why there have been so many problems with voice-recognition in car usage. Not only is the program trying to figure out what is being said, it is also trying to recognize what is not being said..

Speech recognition converts spoken words into understandable processes for machines.

Speech recognition is used in many different ways. GPS, cellular telephones, call-in centers are just some of our everyday usages. Speech recognition usage has become a very common occurrence of how we live our lives

One of the additional common usages of speech recognition is translating spoken words into text. Future articles will talk in greater detail about speech recognition and computer usage. Programs such as Vista speech and Dragon NaturallySpeaking are two highly rated programs. It is not uncommon nowadays be able to have 99% accuracy with the right equipment while using these programs. I personally have been using Dragon NaturallySpeaking for over a decade. I have seen great improvements in recognition accuracy over the last 10 years. Speech recognition is no longer something to be looked at briefly and then tossed aside as many people do. It truly is a misunderstood and largely underutilized input method.

Advertisement

Intro to Speech Computing

Share this:

Related Posts