Text-to-Speech Synthesis

Chapter 17: Further Issues

This chapter contains a number of final topics, which have been left until last because they span many of the topics raised in the previous chapters.

17.1 Databases

Data-driven techniques have come to dominate nearly every aspect of text-to-speech in recent years. In addition to being affected by the algorithms themselves, the overall performance of a system is increasingly dominated by the quality of the databases that are used for training. In this section, we therefore examine the issues in database design, collection, labelling and use.

All algorithms are to some extent data-driven; even hand-written rules use some data , either explicitly or in a mental representation wherein the developer can imagine examples and how they should be dealt with. The difference between hand-written rules and data-driven techniques lies not in whether one uses data or not, but concerns how the data are used. Most data-driven techniques have an automatic training algorithm such that they can be trained on the data without the need for human intervention.

17.1.1 Unit-selection databases

Unit selection is arguably the most data-driven technique because little or no processing is performed on the data, rather it is simply analysed, cut up and recombined in different sequences. As with other database techniques, the issue of coverage is vital, but in addition we have further issues concerning the actual recordings.

There is no firm agreement on how big a unit-selection system needs to be, but it is clear that, all other things being equal, the larger the better. As...

UNLIMITED FREE
ACCESS
TO THE WORLD'S BEST IDEAS

SUBMIT
Already a GlobalSpec user? Log in.

This is embarrasing...

An error occurred while processing the form. Please try again in a few minutes.

Customize Your GlobalSpec Experience

Category: Data Mining Software
Finish!
Privacy Policy

This is embarrasing...

An error occurred while processing the form. Please try again in a few minutes.