16.2: Features

16.2 Features

16.2.1 Base types

The first feature we will examine is the base type, that is the type of units we will use in the synthesiser. The base type chosen in second-generation systems was often the diphone, since diphones often produced good joins. In unit selection, the greater variability in the units means that we can t always rely on diphones joining well, so the reasons for using diphones are somewhat less convincing. Indeed, from a survey of the literature, we see that almost every possible kind of base type has been used. In the following list we describe each type by its most common name, ^[2] cite some systems that use this base type, and give some indication of the number of each type, where we assume that we have N unique phones and M unique syllables in our pronunciation system.

frames Individual frames of speech, which can be combined in any order [204].
states Parts of phones, often determined by the alignment of HMM states [138, 140].
half-phones These are units that are half the size of a phone. Thus, they are either units that extend from the phone boundary to a mid point (which can be defined in a number of ways), or units that extend from this mid point to the end of the phone. There are 2 N different half-phone types [315].
diphones These units extend from the mid point of one...

< Previous Excerpt Next Excerpt >

Purchase This Book

Text-to-Speech Synthesis

TABLE OF CONTENTS

16.2: Features

16.2 Features

16.2.1 Base types

Contact Preferences

This is embarrasing...

Customize Your GlobalSpec Experience

Select Your Free Newsletters

Industry Newsletters

Select Your Free Product Alerts

This is embarrasing...