Multimedia Networking: From Theory to Practice

The human vocal and auditory organs form one of the most useful and complex communication systems in the animal kingdom. All speech (voice) sounds are formed by blowing air from the lungs through the vocal cords (also called the vocal fold), which act like a valve between the lung and vocal tract. After leaving the vocal cords, the blown air continues to be expelled through the vocal tract towards the oral cavity and eventually radiates out from the lips (see Figure 2.1). The vocal tract changes its shape with a relatively slow period (10 ms to 100 ms) in order to produce different sounds [1] [2].
In relation to the opening and closing vibrations of the vocal cords as air blows over them, speech signals can be roughly categorized into two types of signals: voiced speech and unvoiced speech. On the one hand, voiced speech, such as vowels, exhibit some kind of semi-periodic signal (with time-varying periods related to the pitch); this semi-periodic behavior is caused by the up down valve movement of the vocal fold (see Figure 2.2(a)). As a voiced speech wave travels past, the vocal tract acts as a resonant cavity, whose resonance produces large peaks in the resulting speech spectrum. These peaks are known as formants (see Figure 2.2(b)).
On the other hand, the hiss-like fricative or explosive unvoiced speech, e.g., the sounds, such as s,...