The Software Optimization Cookbook: High-Performance Recipes for IA-32 Platforms, Second Edition

The optimization concepts discussed in this book are useful on all Intel IA-32 family processors. But a few optimizations require specific knowledge of the features and cache architecture of a particular Intel Architecture Processor. Some issues require the use of assembly language to fix while other issues can be dealt with in a high-level language. This chapter compares the major Intel IA-32 and Intel EM64T architectures, providing tips on optimizations that are specific to each.
The IA-32 Intel architecture started with the Intel386 microprocessor back in 1985. Optimizing specifically for the Intel386 primarily relied upon carefully selecting the best assembly language instructions to use and how to order them efficiently. Tedious and time-consuming, only the most demanding applications got hand-coded assembly language attention. A few years later, the Intel486 processor was introduced, and it also relied on assembly language. But in 1993, optimizations for the Pentium processor really got things cooking. By writing an algorithm following a set of pairing rules, performance could be, in some cases, doubled. Compilers and optimization tools were provided to help with optimizations and with analyzing the sequence of instructions to make sure that the maximum performance was obtained. The new processors and tools have shifted the focus away from the specific order of instructions to high-level concepts, such as organizing data for efficient memory access, using SIMD instructions, multithreading, and reducing data dependencies.
Optimizations specific to processor architecture fall into the...