4.3: Elements of Quality of Speech Synthesizers
4.3 Elements of Quality of Speech Synthesizers
In previous chapters speech technologies and their associated issues were viewed from the perspective of quality design. Individual systems were mainly handled like black boxes . Design objectives were set and illustrated by examples, and the influence of prerequisites and conditions on the feasibility of reaching these goals was discussed.
This chapter will look at a glass box approach of speech technology systems, using speech synthesis as an example. One part of this will include a detailed description of system input, architecture and output.
4.3.1 The System Input
In general, the term speech synthesis is understood as a transformation of information coded other than acoustically into acoustic speech signals. In literature ambiguous terminology has arisen: The term speech synthesis is frequently synonymous with synthesis-by-concept , text-to-speech conversion or voice generation techniques Only when synthesis is clearly coupled with a further restrictive human performance (e.g. dialog, transmission or translation systems) is this part of the system clearly defined and named by an internal border (cf. [22], [185], [259], [260]).
In this book the term synthesis is synonymous with speech device . Synthesis is used as a hypernym when it is not necessary to differentiate between systems of varying performances. If a differentiation has to be made, it is done according to different system inputs:
-
translated text-to-speech synthesis
-
dialog-to-speech synthesis
-
concept-to-speech synthesis
-
text-to-speech synthesis
-
phoneme-to-speech synthesis
-
manually operated synthesis
It is easy to recognize that there are considerable differences between the system complexity of...