Applied Speech and Audio Processing: With MATLAB Examples

As mentioned at the beginning of Chapter 1, audio samples need to be quantised in some way during the conversion from analogue quantities to representations on computer. In effect, the quantisation process is reducing the amount of information stored: the fewer bits of quantisation, the less audio information is captured.
Most real-world systems are bandwidth (rate) or size constrained, such as an MP3 player only being able to store 1 Gbyte of audio. Obviously to get the most out of such a device it is necessary to reduce the number of bits required to store the audio, but without compromising quality too much. This is generally known as audio compression. A large part of handling speech in communication systems is in determining how to reduce the number of bits stored or conveyed, whilst maximising quality or intelligibility.
Pulse coded modulation (PCM) is the format delivered by most analogue-to-digital converters (ADCs) and the format of choice for representing audio on a computer. The sound is stored as a vector of samples, with each sample usually (but not always) represented as a single 16-bit value. The samples are supposed to be related to the analogue amplitude of the audio waves travelling through the air in some way, with the timing between samples being determined by the sample rate. This is shown in Figure 5.1 where a waveform is time-sliced, and the average amplitude in each time slice encoded along the bottom. These values form the sample...