Physical Database Design: The Database Professional's Guide to Exploiting Indexes, Views, Storage, and More

The mid-1990s saw the rise of servers with multiple CPUs, known as multiprocessors or symmetric multiprocessors (SMP). These servers included between two and eight CPUs. Typically only one of these CPUs can access memory at the same time. To avoid the serialization of memory access, the CPU caches become increasingly important for multiprocessors and increasingly sophisticated algorithms are used to avoid caches misses on the CPU caches as access to local RAM (memory not in the CPU cache), because the weak link in the performance characteristic becomes increasingly the access to RAM. The serialization of memory access on a symmetric multiprocessor increases this problem significantly. As a result, SMPs rarely have more than eight CPUs. As a result the next generation of multiprocessors was developed in the mid- to late 1990s called nonuniform memory access (NUMA).
NUMA processors divide CPUs into groups called quads, each with their own local bus and RAM. Because each quad has its own local RAM and bus, the memory access and bus traffic are no longer necessarily balanced, hence the term nonuniform memory access. NUMA architectures can scale to much larger numbers of CPUs, often 32 or 64 units. However, maintaining memory consistency (cache coherence) between the quads becomes complex, and NUMA systems remain an expensive option. While NUMA systems can scale to a larger number of CPUs than regular SMPs, they cannot scale as large as shared-nothing or Grid systems (discussed in Chapter 6).