From human speech perception to considerations for features for automatic speech recognition
Different places along the cochlea respond to the incoming frequency.
Nonlinearity in hertz scale, linear in mal scale, for cochlea.
Simplify of cochlea is like a bank of bandpass filters.
lower frequency limit and higher frequency limit.
Wider and wider in higher frequency. triangular filters is more appropriate than the rectangle one shown in figure above.
Representing speech as a sequence of feature vectors
Wavepoint is not useful,m magnitude spectrum (DFT) is better and spectrum envelop as feature is better better. To use spectrum envelop, we decide to use filter bank to encode feature envelop. Feature vector stores the encoded feature envelop or the feature banks.
sequences are everywhere in language
sequence of feature vectors
We start to look at the concepts of distance and alignment between sequences of speech data
Exemplar 范例: a stored feature vectors of a word.
Distance (dissimilarity) between two sequence of feature vectors.
Create alignment of exemplars and the unknown is the first step to calculate the global distance.
Search a grid with Dynamic Time Warping
Dynamic time warping, pattern matching, aligning frames