Key Steps to High Speech Recognition Accuracy*
By Douglas Durham, with help from Martin Markoe, Susan Fulton and Judy Evans
Updated November 2007
*This is a living document. It has been added to and updated several times.
[Back to Article Index]
Purpose of this article
This article provides beginning Speech Recognition (SR) users a step-by-step guide to achieving high accuracy with the least trouble. It is based on personal experience and on what I have learned from others over the last eight months. My initial experience was with Dragon and IBM ViaVoice. I am particularly indebted to Susan Fulton and Martin Markoe, who are co-authors. They freely provided help when I was starting and substantially improved this article by their contributions. (See the resources section at the end of this article for their Web pages.)
At first glance this list of things to do might seem onerous for it involves a significant investment of time. However, if you prepare for and learn Speech Recognition in a systematic way, you’ll have a good chance of success. Following this methodical approach will increase the probabilities of getting 99 percent accuracy. The difference between 97 percent accuracy and 99 percent accuracy is a great deal of time saved later in editing and correcting misrecognitions. As the old auto mechanic saying goes: “You can pay me a little now or a lot later. Take your pick.”
This article attempts to be general enough so any user of Speech Recognition can benefit. The three of us use or have used Dragon NaturallySpeaking, IBM ViaVoice and VoiceXpress. We have tried to keep the material generic.
Seven key steps
There are seven key steps to achieve high speech recognition accuracy.
- A proper computer hardware system and microphone that produce clear sound input.
- Correct and consistent microphone placement.
- Performing a full enrollment to train the software.
- Testing the microphone and system.
- Using the vocabulary builder/expander to add context to the personal enrollment.
- Training and more training.
- Participation in a user group.
Having a proper computer system and microphone
This is the essential foundation you must lay before you start. Do not buy voice software until you have the right system.
System Specifications:
The minimum specifications that the various vendors list are far below optimal. If your objectives are to use Microsoft Word and Outlook for email while on the Internet, you want a computer with at least a Pentium 4, 1.5 GHz, and 512 Meg RAM. This is a minimum for good results with the latest SR software.
You might do well with lesser specifications by running one application at a time and rebooting after using the Internet. These actions keep your system resources high. If you plan on doing considerable dictation, close all applications. Then reboot and just use Word or Word Perfect or the native word processor that comes with your speech recognition so