Text-to-Speech Synthesis

We will now turn to the issue of how to design a target function. The job of the target function is to assess the suitability of a unit. This can be formulated as a function that when given a specification item and unit returns a distance or cost. While there are other possible formulations, which for instance simply rank the candidates (no spacing between each candidate and the next is given) or we have a generative probabilistic model with which we calculate the probability that specification s t generates unit u i, we will limit ourselves here to the distance. In the most general sense this function will return a ranked list of all the units in the database, each with a score, cost or distance. In practice, though, we usually eliminate from consideration any unit that does not match the base type of the specification. In other words, if our specification is a /n-iy/ unit, we consider only those units in the database which have this base type. We term the set of all units that match the base type the full set of candidates. The size of this set varies considerably, from sometimes only one or two units for rare cases (e.g. /zh-uw/) to large numbers (in the thousands or tens of thousands) for common units such as /s-ax/ and /ax-ng/); an average number, however, might be 500. Since the total...