Monday, February 8, 2010

Speaker Recognition Methods

These techniques can identify if a speaker belongs to a set of known people.


Speaker recognition is the computer problem of establishing the identity of a speaker using voice characteristics. It is different from speech recognition, where the goal is to identify the words being spoken. An example of speaker recognition technology is building security, where a door only opens when a given person speaks into the microphone. Several methods can be used to accomplish this task.


Frequency Estimation


The spoken signal has an unknown noise component, such as background noise and audio equipment noise. Frequency estimation methods estimate the noise component by using techniques such as solving for eigenvectors, a type of mathematics important in physics and engineering; subtracting the noise from the input to get an approximation to the signal of interest; and decomposing that signal as a sum of complex frequency components. The most important fact about this method is that the noise-free voice of a given speaker is reduced to a more manageable representation: the voice's intensity on a few frequency components (that happen to be the most intense ones.) This method works well when background noise is a problem and when the words spoken when the system was trained may not be exactly the same words spoken when trying to authenticate the speaker.


Hidden Markov Models


A hidden Markov model always is in one of a set of states, but the current state is not visible to the observer. Such a model is constantly making transitions from the current state to the next at rates, and with probabilities, determined by the model's parameters. When making a transition, the model may emit an output with a known probability. The same output can be generated by a transition from multiple states, with different probabilities. In the particular case of speaker recognition, a hidden Markov model emits outputs representing phonemes with probabilities that depend on the prior sequence of visited states. A speaker uttering a sequence of phonemes (i.e., talking) corresponds to the model visiting a sequence of states and emitting outputs corresponding to the same phonemes. This method works well to authenticate the speaker by having him utter a sequence of words forming complete sentences.


Pattern Recognition


This technique, among the most complex being used for speaker recognition, compares two voice streams: the one spoken by the authenticated speaker while training the system, and the one spoken by the unknown speaker who is attempting to gain access. The speaker utters the same words when training the system and, later, when trying to prove his identity. The computer aligns the training sound stream with the one just obtained (to account for small variations in rhythm and for delays in beginning to speak). Then, the computer discretizes each of the two streams as a sequence of frames and computes the probability that each pair of frames was spoken by the same speaker by running them through a multilayer perceptron--a particular type of neural network trained for this task. This method works well in low-noise conditions, and when the speaker is uttering exactly the same words used to train the system.

Tags: method works, method works well, same words, This method works, works well