Programming with Intel Extended Memory 64 Technology: Migrating Software for Optimal 64-bit Performance

Optimize blindly. Tune first, measure later.
Mitchell N. Charity, "Principles of Computer System Misdesign"
Intel has an obvious interest in making sure software runs fast on its processors. Because of this, the company has a large cadre of optimization engineers who work with ISVs and other software vendors to tune their products to make best use of the underlying silicon. The expertise built up over many years of working through codebases ferreting out ways to improve performance led Intel to design a methodology for optimization that could be applied effectively to most every software project. Since its formulation, the methodology has been applied and refined over numerous software packages, such that it now represents Intel's best-known method for optimizing code. This chapter examines this approach and shows its application on a straightforward program.
The methodology employs what Intel calls a top-down, closed-loop model. The top-down aspect does not refer to the top-down concept familiar to programmers from structured programming. That concept suggests code should be designed from a mainline that branches off into successively smaller building blocks. These building blocks are driven by a code hierarchy that inevitably can be traced back to successively higher levels. That is a logical, structured design.
In optimization work, top and bottom depend not on a logical relationship but on a purely practical dimension: you first optimize the elements "whose performance affects the performance profile of other elements. For example, you would make sure your memory allocations were optimized before...