Voice and Speech Quality Perception: Assessment and Evaluation

With regard to speech synthesis, engineers aim to develop a system which, similar to humans, can carry information, make announcements, or read texts out loud. In historical terms the starting point for constructing a speech synthesis system was the atomistic idea of the essence of speech and the aim to copy this using technical means. However, it soon became apparent that natural speech sounds consist of far more than the concatenation of single speech sounds, syllables, or words. Speech does not comprise sounds that act as single events like letters of the alphabet which are put together to form words and sentences. Speech may consist of a tightly knit network of speech sounds, but additionally there is the matter of structure and meaning. Understanding speech always involves guessing and predicting words in a complex interaction between eyes and ears, memories, expectations, experiences and feelings. The listeners' brains mobilizes all their faculties to make sense of what their partners are saying. Hearing does not end with the ear. In the course of developing a speech synthesis system this fact was clearly audible.
In the infancy of speech synthesis the main interest was to copy man's ability to speak using technical means. The goal was to build a functioning model of the human speech organs in order to gain more knowledge on how human speech production worked.
It began by observing how speech is produced:...