Questions on WSR Software
- I want to use the Transcription feature of the WSRToolkit to have my digital recordings of lectures turned into editable text. Is this possible?
#1 Answer:
This is a question we hear a lot. The nature of speech recognition software, the fact that it works, is that it is Speaker Dependent. This means for accuracy to be acceptable, 95% or better (any less the errors make it difficult to understand what was said):
- The individual speaker must train the software to his or her voice to develop an acoustic model unique to the individual.
- You cannot talk conversationally to the software. Each word must be enunciated clearly. The first step in the software understanding what you said is to try to match each word acoustically by its sounds.
- After a best guess is made by the software as to the word you dictated based on its sound, the software then compares each word to the words around it for context clues. Think of, "Two boys went to see a doctor because they ate too much food." Speech recognition software is able to figure out which two, to or too, to use based on the surrounding words. One of the things that helps in this regards is to speak in phrases as well as to enunciate clearly and to use punctuation marks as you are dictating.
In other words, the software is not designed nor is it able to do what you are asking it to do. However, there is an alternative technique that works very well. It is called Echo Dictating. This means you listen to the playback of the digital recording and dictate directly into speech recognition software. A foot pedal to stop and restart the recording is advantageous when using this technique.
