Text-to-Speech Synthesis

16.2: Features

16.2 Features

16.2.1 Base types

The first feature we will examine is the base type, that is the type of units we will use in the synthesiser. The base type chosen in second-generation systems was often the diphone, since diphones often produced good joins. In unit selection, the greater variability in the units means that we can t always rely on diphones joining well, so the reasons for using diphones are somewhat less convincing. Indeed, from a survey of the literature, we see that almost every possible kind of base type has been used. In the following list we describe each type by its most common name, [2] cite some systems that use this base type, and give some indication of the number of each type, where we assume that we have N unique phones and M unique syllables in our pronunciation system.

  • frames Individual frames of speech, which can be combined in any order [204].

  • states Parts of phones, often determined by the alignment of HMM states [138, 140].

  • half-phones These are units that are half the size of a phone. Thus, they are either units that extend from the phone boundary to a mid point (which can be defined in a number of ways), or units that extend from this mid point to the end of the phone. There are 2 N different half-phone types [315].

  • diphones These units extend from the mid point of one...

UNLIMITED FREE
ACCESS
TO THE WORLD'S BEST IDEAS

SUBMIT
Already a GlobalSpec user? Log in.

This is embarrasing...

An error occurred while processing the form. Please try again in a few minutes.

Customize Your GlobalSpec Experience

Category: Telephones and Cellular Phones
Finish!
Privacy Policy

This is embarrasing...

An error occurred while processing the form. Please try again in a few minutes.