RNID: For deaf and hard of hearing people.

Technological limits

While speech recognition has been being developed for many decades now, it only works well within very specific constraints. Speech recognition systems use complex algorithms to try and convert spoken language into text. This is done by using statistical rules and pattern recognition to match given speech to syllables, words and word groups, and so into sentences. However, such a system has no actual understanding of the meaning of the speech itself.

As a result, current speech recognition systems perform a reasonable job for well-structured speech; that is, speech that is grammatically and syntactically well formed. However, for natural speech that is used in conversations between humans, or in meetings and the like, speech recognition performs very poorly. The audio input into the system needs to be of high quality and free from background noises and other artefacts. And most such systems rely on speaker training, so that the system can be tuned to better recognise specific speaker’s voices.

Harnessing speech recognition

For natural speech recognition the system needs to understand the content of what is being said, and to use implicit knowledge about the speakers, their environment, their history and background, and the context of the conversation in order to work. Unfortunately, we haven’t yet reached the stage where computers are intelligent enough to make that possible.

many broadcasters deliver subtitling through speech recognition assisted technology

Despite these limitations, there are already a few areas where speech recognition technology can bring benefits to deaf and hard of hearing people. For example, many broadcasters deliver subtitling through speech recognition assisted technology: trained operators revoice what is being said, using a form and structure for their speech that the system will recognise. They can also edit the recognised text before it goes live.

RNID is working on a speech recognition module for its SpeedText notetaking software, which would allow trained operators to improve their performance. We are also experimenting with processing telephone audio to let it recognise a limited set of predefined phrases and expressions. Processing telephone speech is more difficult, because the audio streams of a telephone system are of inferior quality compared with direct speech input. It is not as clear as direct speech input, and therefore many of the features that speech recognition engines rely upon are no longer present, which makes it more difficult to perform good recognition.

The future

Speech recognition is already bringing some benefits to deaf and hard of hearing people, but at the same time we still need much more research to develop it further for natural speech processing. RNID will continue to be at the forefront of these developments.

Further information

You can find out more by contacting us by email at ict@rnid.org.uk.