Chapter 4: Text Segmentation and Organisation

The next three chapters of the book deal with how to extract linguistic information from the text input. This chapter covers various pre-processing issues, such as how to find whole sentences in running text and how to handle various markup or control information in the text. Chapter 5 describes the main processes of text analysis itself, such as how to resolve homograph ambiguity. Finally, Chapter 6 describes how to predict prosody information from an often impoverished text input. In many ways, this subject shares similarities with text analysis. There is an important difference, however, in that, while we can view text analysis as a decoding problem with a clear right and wrong, prosody prediction has no strict right and wrong since we are attempting to determine prosody from an underspecified input.

4.1 Overview of the Problem

The job of the text-analysis system is to take arbitrary text as input and convert this into a form more suitable to subsequent linguistic processing. This can be thought of as an operation whereby we try to bring a sense of order to the often quite daunting range of effects present in raw text. If we consider

(12) Write a cheque from acc 3949293 (code 84-15-56), for $114.34, sign it and take it down to 1134 St Andrews Dr, or else!!!!

we can see that this is full of characters, symbols and numbers, all of which have to be interpreted and spoken correctly.

In view of our communication model (Chapter 2), we can more...

< Previous Excerpt Next Excerpt >

Purchase This Book

Text-to-Speech Synthesis

TABLE OF CONTENTS

Chapter 4: Text Segmentation and Organisation

4.1 Overview of the Problem

Contact Preferences

This is embarrasing...

Customize Your GlobalSpec Experience

Select Your Free Newsletters

Industry Newsletters

Select Your Free Product Alerts

This is embarrasing...