Programming Itanium-based Systems: Developing High Performance Applications for Intel's New Architecture

Although the Itanium processor and its associated compilers are built for speed, it still takes some planning and practice to get top performance out of application code. In fact, it is easy to accidentally sacrifice a lot of performance by overlooking the way compilers interpret high-level code and compiler switch options. This chapter takes a look at ways to optimize and tune code for the Itanium processor.
The definitions of optimization and tuning overlap somewhat, but generally speaking optimization starts with good code and algorithm design, whereas tuning usually begins with code that has been written and run and tested for performance, with the aim of getting better performance. Tuning sometimes leads to code changes that are not obvious from an optimization-theoretical point of view. Tuning tends to rely more on the empirical, and often identifies sections of code that are good candidates for further optimization. If optimization is considered prevention, then tuning is akin to cure. Most of this chapter focuses on the theory of optimization, but ends by discussing the state of tuning tools available for the Itanium processor.