Cochlea, Mel-Scale, Filterbanks

From human speech perception to considerations for features for automatic speech recognition

Cochlea
Different places along the cochlea respond to the incoming frequency.

s03574511102022

Mel scale
Nonlinearity in hertz scale, linear in mal scale, for cochlea.

s03592611102022

Filter banks
Simplify of cochlea is like a bank of bandpass filters.
lower frequency limit and higher frequency limit.

s04010611102022

Wider and wider in higher frequency. triangular filters is more appropriate than the rectangle one shown in figure above.

Feature vectors, sequences, and sequences of feature vectors

Representing speech as a sequence of feature vectors

Features
Wavepoint is not useful,m magnitude spectrum (DFT) is better and spectrum envelop as feature is better better. To use spectrum envelop, we decide to use filter bank to encode feature envelop. Feature vector stores the encoded feature envelop or the feature banks.
s04193611102022

sequences are everywhere in language
s04261611102022

sequence of feature vectors
s04295811102022

Exemplars and Distances

We start to look at the concepts of distance and alignment between sequences of speech data

Exemplar 范例: a stored feature vectors of a word.
Distance (dissimilarity) between two sequence of feature vectors.
Create alignment of exemplars and the unknown is the first step to calculate the global distance.
s04443711102022

Pattern Matching, Alignment, Dynamic Time Warping (DTW)

Search a grid with Dynamic Time Warping

Dynamic programming:

Dynamic time warping, pattern matching, aligning frames
s05551211102022

More Dynamic Time Warping

s06132511102022

s06103311102022


Origin: Module 7 – Speech Recognition – Pattern matching
Translate + Edit: YangSier (Homepage)

:four_leaf_clover:碎碎念:four_leaf_clover:
Hello米娜桑,这里是英国留学中的杨丝儿。我的博客的关键词集中在编程、算法、机器人、人工智能、数学等等,点个关注吧,持续高质量输出中。
:cherry_blossom:唠嗑QQ群兔叽的魔术工房 (942848525)
:star:B站账号白拾Official(活跃于知识区和动画区)