Joe Celko's Data and Databases: Concepts in Practice
By Joe Celko
5.4 Hashing Functions
5.4 Hashing Functions
Hashing functions take a very different approach to finding data. These functions take the value of the column(s) used in the search and convert them into a number that maps into an address in physical data storage. This mapping is handled with an internal data structure called a hash table (not to be confused with a database table!), which has the actual disk address in it.
The disk drive can then be positioned directly to that physical address and read the data. Since doing calculations inside the main processor is many orders of magnitude faster than reading a disk drive, hashing is the fastest possible access method for a single row in a table. The trade-off is ?speed for space,? as it often is in computing.
Unify?s Unify SQL and Centura?s (nee Gupta) SQLBase products both have a CREATE HASH command. Teradata uses proprietary hashing algorithms to handle large sets of data. But even outside the world of computers, hashing is not as exotic as you might think. If you place a telephone order with Sears or J. C. Penney, they will ask you the last two digits of your telephone number. That is their hashing function for converting your name into a two-digit number (if you have no telephone, then they use 00).
The trick in a CREATE HASH command is how good the hashing algorithm is. A good hashing algorithm will minimize collisions (also called ?hash clash?)?where two different input values hash to...
Copyright Morgan Kauffmann Publishers 1999 under license agreement with Books24x7