Videoconferencing and Videotelephony: Technology and Standards, Second Edition

One important component of any audiovisual terminal for videoconferencing, videophone, and general multimedia applications is the speech coder. Indeed, it is generally agreed that the audio component is more critical than the video portion of the transmission. This chapter has three parts: (1) the attributes of speech coding, (2) a review of currently available speech coders from the ITU, (3) and a brief overview of MPEG audio coders. It should be noted that echo cancellation, although not standardized, is a very important part of any audio conferencing system. The basics of echo cancellation are treated in Chapter 2.
Speech coders have four attributes: bit rate, quality, complexity, and delay. For a given application, some of these attributes are predetermined while trade-offs can be made among the others. For example, quality can usually be improved by increasing bit rate or complexity and sometimes by increasing delay. In the following subsections, the various attributes, with particular relevance to low bit-rate speech coding, are discussed.
Public-switched telephone network (PSTN) video telephones are expected to operate at bit rates up to at least 33.6 Kbps. The higher the bit rate, the better the video quality. Given this fact, it is desirable that the speech coder use as little of the total bit rate as possible. Prior to Recommendations G.729 and G.723.1, ITU-T speech coding recommendations only existed for bit rates of 16 Kbpsand higher. Rates lower than 9.6 Kbps have been used for digital cellular telephones, secure...