Module 1: Phonetics and Representations of Speech

Systems in Speech Production

Speech production involves three systems in the body: the respiratory system, the phonation system, and the articulation system (Figure 1.2).

s09554910192022

The Respiratory System

The respiratory system supplies the air needed to initiate speech sounds (see Figure 1.3). It consists of parts of the body that allow us to breathe, including the lungs, the diaphragm, the muscles of the rib cage, and the abdominal muscles.

s09590310192022

The Phonation System

  • The phonation system comprises the larynx and its internal structure.
    • Formed by two major cartilages, the thyroid and the cricoid,
      • the larynx (commonly known as the Adam’s apple) sits on a ring of connecting cartilage known as the trachea, or the windpipe (Figure 1.4a).
    • Inside the larynx are the vocal folds (Figure 1.4b, and History Box).
      • They are made up of layers of tissue attached to a pair of arytenoid cartilages at the back end, and to the thyroid cartilage at the front end.
      • Movements of the arytenoids either bring the vocal folds together (adduct) and close off the airflow from the lungs, or move them apart (abduct) to allow the upward flow of air without obstruction.
      • The spacing between the vocal folds is the glottis.

s10043810192022

The Articulation System: The Vocal Tract

  • Airflow through the glottis is further modified inside the vocal tract, which consists of three main cavities: the pharyngeal cavity, the oral cavity, and the nasal cavity (Figure 1.6).
    • The upper surface of the oral cavity contains relatively stationary or passive articulators, including the upper lip, the teeth, the alveolar ridge, the hard palate, the soft palate (also called the velum), and the uvula.
    • The lower lip and the tongue are the main mobile or active articulators on the lower surface of the vocal tract.
  • Different parts of the tongue are involved in speech production, and it is divided into different areas: tongue tip, blade, front, mid, body, back, and root.
  • Active articulators move toward passive articulators to form varying degrees of constriction, which shape the airflow before it leaves the vocal tract as distinct speech sounds.

s09525410192022

Consonants

Voicing

  • Voicing occurs when the air from the lungs pushes the closed vocal folds apart, causing them to vibrate.
    • Sounds produced with vocal fold vibration are voiced.
    • In contrast, voiceless sounds are those made without vibration of the vocal folds

Place of articulation

s10105010192022

Manner

Stops, trill, fricative, lateral fricatives, approximant.

Vowels

Monophthong: A vowel produced with the same tongue position throughout the syllable that is perceived as having a single quality such as the <ee> vowel in English ‘bee’.

Diphthong: A vowel produced with tongue movement from one vowel to another within the same syllable, thus being perceived as having two qualities, such as the <uy> vowel in English ‘buy’.

Oral vowels: Vowels produced without velum lowering or nasalization.

Nasal vowels: Vowels produced with airflows through the nasal and the oral cavities to signal a meaning contrast with oral vowels, as in French <bon> ‘good’ vs. <beau> ‘beautiful’.

Module 2: Acoustics of Consonants and vowels

Fundamental Frequency (F0): The lowest frequency component in a complex periodic sound.

Resonance: The condition whereby vibrations of a physical body such as an object or an air column are amplified by an outside force of the same natural frequency (sympathetic vibration).

Module 3: Digital Speech Signals

pass

Module 4: Source-Filter model

Q: Typically, telephones do not transmit any frequencies below 300Hz. What is the consequence on the listener’s perception of pitch, when the speaker has a fundamental frequency of 200Hz?

A: The perceived pitch will be 200Hz, corresponding to the pitch of the original speech. As we saw in class, even if we use a high pass filter to remove frequencies that include F0, we can (somewhat surprisingly) still perceive the original pitch. This is because the harmonic structure is still in the signal (i.e., multiples of the F0) and the brain can use this to reconstruct F0. (Module 4 Lecture discussion,
this is a stretch question!).


Translate + Edit: YangSier (Homepage)

:four_leaf_clover:碎碎念:four_leaf_clover:
Hello米娜桑,这里是英国留学中的杨丝儿。我的博客的关键词集中在编程、算法、机器人、人工智能、数学等等,点个关注吧,持续高质量输出中。
:cherry_blossom:唠嗑QQ群兔叽的魔术工房 (942848525)
:star:B站账号白拾Official(活跃于知识区和动画区)