Text-to-Speech Synthesis

Chapter 8: Pronunciation

We now turn to the problem of how to convert the discrete, linguistic, word-based representation generated by the text-analysis system into a continuous acoustic waveform. One of the primary difficulties in this task stems from the fact that the two representations are so different in nature. The linguistic description is discrete, the same for each speaker for a given accent, compact and minimal. By contrast, the acoustic waveform is continuous, is massively redundant, and varies considerably even between utterances with the same pronunciation from the same speaker. To help with the complexity of this transformation, we break the problem down into a number of components. The first of these components, pronunciation, is the subject of this chapter. While specifics vary, this can be thought of as a system that takes the word-based linguistic representation and generates a phonemic or phonetic description of what is to be spoken by the subsequent waveform-synthesis component. In generating this representation, we make use of a lexicon, to find the pronunciations of words we know and can store, and a grapheme-to-phoneme [1] ( G2P) algorithm, to guess the pronunciations of words we don t know or can t store. After doing this we may find that simply concatenating the pronunciations for the words in the lexicon is not enough; words interact in a number of ways and so a certain amount of post-lexical processing is required. Finally, there is considerable choice in terms of how exactly we should specify the pronunciations for...

UNLIMITED FREE
ACCESS
TO THE WORLD'S BEST IDEAS

SUBMIT
Already a GlobalSpec user? Log in.

This is embarrasing...

An error occurred while processing the form. Please try again in a few minutes.

Customize Your GlobalSpec Experience

Category: Language Translation Software
Finish!
Privacy Policy

This is embarrasing...

An error occurred while processing the form. Please try again in a few minutes.