Chapter 7: Character String Data

Chapter 7: Character String Data

Overview

Characters are represented internally in the computer as bits, and there are several different systems for doing this. Almost all schemes use fixed-length bit strings. The three most common ones in the computer trade are ASCII, EBCDIC, and Unicode.

ASCII (American Standard Code for Information Interchange) is defined in the ISO 464 standard. It uses a byte (8 bits) for each character and is most popular on smaller machines.

EBCDIC (Expanded Binary Coded Digital Information Code) was developed by IBM by expanding the old Hollerith punch card codes. It also uses a byte (8 bits) for each character. EBCDIC is fading out of use in favor of ASCII and Unicode.

These two code sets are what is usually meant by CHAR and VARCHAR data in database products because they are what the hardware is built to handle. There is a difference in both the characters represented and their collation order, which can cause problems when moving data from one to the other.

7.1 National Character Sets

Unicode is what is usually meant by NATIONAL CHARACTER and VARYING NATIONAL CHARACTER datatypes in SQL-92, but this standard represents alphabets, syllabaries, and ideograms.

An alphabet is a system of characters in which each symbol has a single sound associated with it. The most common alphabets in use today are Roman, Greek, Arabic, and Cyrillic.

A syllabary is a system of characters in which each symbol has a single syllable associated with it.

< Previous Excerpt Next Excerpt >

Purchase This Book

Joe Celko's Data and Databases: Concepts in Practice

TABLE OF CONTENTS

Chapter 7: Character String Data

Contact Preferences

This is embarrasing...

Customize Your GlobalSpec Experience

Select Your Free Newsletters

Industry Newsletters

Select Your Free Product Alerts

This is embarrasing...