Cochlea, Mel-Scale, Filterbanks
From human speech perception to considerations for features for automatic speech recognition
Cochlea
Different places along the cochlea respond to the incoming frequency.
Mel scale
Nonlinearity in hertz scale, linear in mal scale, for cochlea.
Filter banks
Simplify of cochlea is like a bank of bandpass filters.
lower frequency limit and higher frequency limit.Wider and wider in higher frequency. triangular filters is more appropriate than the rectangle one shown in figure above.
Feature vectors, sequences, and sequences of feature vectors
Representing speech as a sequence of feature vectors
Features
Wavepoint is not useful,m magnitude spectrum (DFT) is better and spectrum envelop as feature is better better. To use spectrum envelop, we decide to use filter bank to encode feature envelop. Feature vector stores the encoded feature envelop or the feature banks.
sequences are everywhere in language
sequence of feature vectors
Exemplars and Distances
We start to look at the concepts of distance and alignment between sequences of speech data
Exemplar 范例: a stored feature vectors of a word.
Distance (dissimilarity) between two sequence of feature vectors.
Create alignment of exemplars and the unknown is the first step to calculate the global distance.
Pattern Matching, Alignment, Dynamic Time Warping (DTW)
Search a grid with Dynamic Time Warping
Dynamic programming:
Dynamic time warping, pattern matching, aligning frames
More Dynamic Time Warping
Origin: Module 7 – Speech Recognition – Pattern matching
Translate + Edit: YangSier (Homepage)
:four_leaf_clover:碎碎念:four_leaf_clover:
Hello米娜桑,这里是英国留学中的杨丝儿。我的博客的关键词集中在编程、算法、机器人、人工智能、数学等等,点个关注吧,持续高质量输出中。
:cherry_blossom:唠嗑QQ群:兔叽的魔术工房 (942848525)
:star:B站账号:白拾Official(活跃于知识区和动画区)