Module 1: Phonetics and Representations of Speech

Systems in Speech Production

Speech production involves three systems in the body: the respiratory system, the phonation system, and the articulation system (Figure 1.2).


The Respiratory System

The respiratory system supplies the air needed to initiate speech sounds (see Figure 1.3). It consists of parts of the body that allow us to breathe, including the lungs, the diaphragm, the muscles of the rib cage, and the abdominal muscles.


The Phonation System

  • The phonation system comprises the larynx and its internal structure.
    • Formed by two major cartilages, the thyroid and the cricoid,
      • the larynx (commonly known as the Adam’s apple) sits on a ring of connecting cartilage known as the trachea, or the windpipe (Figure 1.4a).
    • Inside the larynx are the vocal folds (Figure 1.4b, and History Box).
      • They are made up of layers of tissue attached to a pair of arytenoid cartilages at the back end, and to the thyroid cartilage at the front end.
      • Movements of the arytenoids either bring the vocal folds together (adduct) and close off the airflow from the lungs, or move them apart (abduct) to allow the upward flow of air without obstruction.
      • The spacing between the vocal folds is the glottis.


The Articulation System: The Vocal Tract

  • Airflow through the glottis is further modified inside the vocal tract, which consists of three main cavities: the pharyngeal cavity, the oral cavity, and the nasal cavity (Figure 1.6).
    • The upper surface of the oral cavity contains relatively stationary or passive articulators, including the upper lip, the teeth, the alveolar ridge, the hard palate, the soft palate (also called the velum), and the uvula.
    • The lower lip and the tongue are the main mobile or active articulators on the lower surface of the vocal tract.
  • Different parts of the tongue are involved in speech production, and it is divided into different areas: tongue tip, blade, front, mid, body, back, and root.
  • Active articulators move toward passive articulators to form varying degrees of constriction, which shape the airflow before it leaves the vocal tract as distinct speech sounds.




  • Voicing occurs when the air from the lungs pushes the closed vocal folds apart, causing them to vibrate.
    • Sounds produced with vocal fold vibration are voiced.
    • In contrast, voiceless sounds are those made without vibration of the vocal folds

Place of articulation



Stops, trill, fricative, lateral fricatives, approximant.


Monophthong: A vowel produced with the same tongue position throughout the syllable that is perceived as having a single quality such as the <ee> vowel in English ‘bee’.

Diphthong: A vowel produced with tongue movement from one vowel to another within the same syllable, thus being perceived as having two qualities, such as the <uy> vowel in English ‘buy’.

Oral vowels: Vowels produced without velum lowering or nasalization.

Nasal vowels: Vowels produced with airflows through the nasal and the oral cavities to signal a meaning contrast with oral vowels, as in French <bon> ‘good’ vs. <beau> ‘beautiful’.

Module 2: Acoustics of Consonants and vowels

Fundamental Frequency (F0): The lowest frequency component in a complex periodic sound.

Resonance: The condition whereby vibrations of a physical body such as an object or an air column are amplified by an outside force of the same natural frequency (sympathetic vibration).

Module 3: Digital Speech Signals


Module 4: Source-Filter model

Q: Typically, telephones do not transmit any frequencies below 300Hz. What is the consequence on the listener’s perception of pitch, when the speaker has a fundamental frequency of 200Hz?

A: The perceived pitch will be 200Hz, corresponding to the pitch of the original speech. As we saw in class, even if we use a high pass filter to remove frequencies that include F0, we can (somewhat surprisingly) still perceive the original pitch. This is because the harmonic structure is still in the signal (i.e., multiples of the F0) and the brain can use this to reconstruct F0. (Module 4 Lecture discussion,
this is a stretch question!).

Translate + Edit: YangSier (Homepage)

:cherry_blossom:唠嗑QQ群兔叽的魔术工房 (942848525)