Voice and Speech Quality Perception: Assessment and Evaluation

In information and communication engineering, phonetics, linguistics, and communication science there are many research projects dealing directly or indirectly with specific aspects of speech synthesis. We have already mentioned voice generation techniques. In this section will be presented other internal functional entities that are necessary if written text is to be transformed into speech.
Of course speech quality assessment does not only cover assessing synthesized speech sounds, but also examining the range of performance of individual functional entities of syntheses, e.g. graph(eme)-to-phoneme conversion, stress assignment, text and sentence analyses. To achieve this, the expert on quality must, at least, have a rudimentary knowledge of how the various processes work. As not all the activities are based on one system, the individual components of different systems are not necessarily interchangeable or compatible. If that were the case, it would be relatively simple to directly compare the advantages and disadvantages of the components to be assessed in the conversion process. The quality of the individual components is often only apparent after several functional units have been executed. This makes it much more difficult to clearly detect poor or non-performance. What synthesis techniques is concerned, it is useful to differentiate between the elements of synthesis and the types of synthesis:
elements of synthesis: phones, diphones, demi-syllables, syllables, words, sequences of words, phrases, sentences, paragraphs, texts
types of synthesis: either directly based on the signal form (so-called time-domain e.g. PSOLA) or the signal form is...