SP Module 7 Pattern Matching
Cochlea, Mel-Scale, FilterbanksFrom human speech perception to considerations for features for automatic speech recognition
CochleaDifferent places along the cochlea respond to the incoming frequency.
Mel scaleNonlinearity in hertz scale, linear in mal scale, for cochlea.
Filter banksSimplify of cochlea is like a bank of bandpass filters.lower frequency limit and higher frequency limit.
Wider and wider in higher frequency. triangular filters is more appropriate than the rectangle one sho ...
SP Module 6 Prosody
Connected and Citation SpeechConnected speech differs from the citation form.
Connected Speech ProcessesConnected speech forms are highly variable as the result of a number of processes that apply to consonants and vowels.
Prosodic StructureProsody is the combination of speech properties that break speech into units of time, indicate the boundaries of those units, and highlight certain constituents.
A constituent is a word or a group of words that function as a single unit ...
SP Modules Review Contents (2)
Module 5 TTS front-end
We want to generate speech that is
Intelligible: you can clearly perceive what words are being said
Natural: sounds like human speech
Appropriate: conveys the right meaning in a specific context
Front-end: Analyze text, generate a linguistic specification of what to actually generate
Front-end purpose: derive a linguistic specification from text that includes the necessary information to generate speech
Back-end: Waveform generation from the linguistic specification
...
MOB LEC11 Mapping and Occupancy Grid
Occupancy grid
Occupancy Map Calculus
Practicle issues of Occupancy map
Inverse Measurement Model
Downsampling for lidar
Other Type of Map
Supplementary Readings
Note of Occupancy Maps. (MUST READ)
Lanelets: Efficient Map Representation for Autonomous Driving. (Optional)
Probabilistic robotics. Read Chapter 9 - Occupancy Grid Mapping for an overview of how occupancy grids are generated
Origin: Dr. Chris Lu (Homepage)Translate + Edit: YangSier (Homepage)
:four_leaf_clover:碎碎念:f ...
MOB LEC10 LIDAR, Point Cloud and Iterative Closest Points
LiDAR
LiDAR calculus
State Estimation via Point Set RegressionProblem define
ICP Algorithm
Origin: Dr. Chris Lu (Homepage)Translate + Edit: YangSier (Homepage)
:four_leaf_clover:碎碎念:four_leaf_clover:Hello米娜桑,这里是英国留学中的杨丝儿。我的博客的关键词集中在编程、算法、机器人、人工智能、数学等等,点个关注吧,持续高质量输出中。:cherry_blossom:唠嗑QQ群:兔叽的魔术工房 (942848525):star:B站账号:白拾Official(活跃于知识区和动画区)
MOB LEC9 Reference Frames and GPS
The coordinate systems in mobile robotics
world coordinate system
ego-vehicle coordinate system
Rigid Body AssumptionRigid Body is a solid body in which deformation is zero or so small it can be neglected.
rigid body assumption implied that movement of a sensor mounted on the robot should be the same as other co-located sensors and the whole platform.
Motion in generalMotion that can be described by a rotation and translation. There are 6 degree of freedom (DoF) in 3D space for rigid body.
...
SP Module 6 Speech Synthesis – Waveform Generation and Connected Speech
DiphonePhones are not a suitable unit for waveform concatenation, so we used diphones, which capture co-articulation.
Diphone starts at the middle of one phone and ends at the middle of the other.
Coarticulation is the overlapping of adjacent articulations or the influence of the target phoneme on surrounding phonemes. Middles of phones are more stable in their spectral properties than the edges, because of coarticulation. So, concatenating diphones should lead to smoother joins
Waveform conca ...
SP Module 5 Speech Synthesis – Phonemes and the Front End
Tokenisation & normalisation标签化 & 正则化
When processing almost any text, we need to find the words. This involves splitting the input character sequence into tokens and normalising each token into words.
Handwritten rulesEvery user of a language holds a lot of knowledge about that language in their mind. One way to capture and make use of that knowledge is in the form of rules.
Finite state transducerFinite State Transducers provide general-purpose machinery for rewriting an input sequen ...
MOB LEC8 Recursive and Kalman Filter
Prerequisite knowledgestates of mobile robot, motion model, position orientation and velocity,
More challenge see: GPS
Kalman FilterPredict, measure, combining
Prediction and correction
Linear Kalman Filter
Recursive Least Squares + Process Model
Extended Kalman FilterLinear approximation, first-order term, still linear.
Linearized motion model, Linearized measurement model.
Jacobian matrix
Limitation of Kalman Filter
Summary
The Kalman Filter is very similar to RLS but includes a motion mode ...
MOB LEC7 Semantic Segmentation
Problem Formulationpixel-level prediction
Challenge of Semantic Segmentation
General vision challenges: occlusion, truncation, scale and illumination changes
Challenges specific to segmentation: smooth boundaries are hard to obtain due to the intrinsic ambiguity and resolution limitation in the image space.
Segmentation with DNN
Semantic Segmentation for Scene Understanding
RANSAC algorithm
boundaries
Additional materialTrue Positive, False Positive, False Negative
individual object
help ...
SP Modules Review Contents
Module 1: Phonetics and Representations of SpeechSystems in Speech ProductionSpeech production involves three systems in the body: the respiratory system, the phonation system, and the articulation system (Figure 1.2).
The Respiratory SystemThe respiratory system supplies the air needed to initiate speech sounds (see Figure 1.3). It consists of parts of the body that allow us to breathe, including the lungs, the diaphragm, the muscles of the rib cage, and the abdominal muscles.
The Phonation S ...
SP Module 4 the Source-Filter Model
HarmonicsIn the frequency domain, periodic signals have harmonic structure: they contain energy only at multiples of their fundamental frequency.
Voice sounds different from unvoiced sounds, has repeating pattern, in periodicity. So, the peak of the sound in the frequency domain is clear to observe.
Impulse trainAn impulse train is the simplest periodic signal that has energy at all multiples of its fundamental frequency (and energy are evenly distributed).
Spectral envelopeVarying the shape o ...
PI Week6 the Human Element
Introduction
This week we continue the theme of looking at general kinds of harm technology can cause if we’re not careful with it.
The topic we’re looking at is one that, with a few exceptions, gets a lot less press I think than bias because it’s a lot harder to quantify (and as established in week 1, computer scientists love quantitative data)
That topic is human beings, and specifically what we might unintentionally affect when we remove people from a system, or at least change their involvem ...
PI Week5 Bias and Fairness
Introduction
We’ve covered some important background concepts in talking about Responsibility, Power and Data. The idea behind these has been to motivate some of why we should be interested in ethics in computer science, and why we are often going to be at least partially responsible for anticipating and mitigating harms that technical artifacts can bring about.
For this middle portion of the course, we are looking at some broad categories of harm (some of the Data readings begun this process). ...
PI Week4 Data Ownership
The increasing availability of digital data reflecting economic and human development, and in particular the availability of data emitted as a by-product of people’s use of technological devices and services, has both political and practical implications for the way people are seen and treated by the state and by the private sector. Yet the data revolution is so far primarily a technical one: the power of data to sort, categorise and intervene has not yet been explicitly connected to a social ju ...