Programming Itanium-based Systems: Developing High Performance Applications for Intel's New Architecture

It is not the purpose of this chapter to make a processor-independent general review of the principles of optimization, but it is certainly worth mentioning how some of the classic principles still apply to the Itanium processor, sometimes with unusual twists.
The first line of code optimization defense is proper choice of algorithm. An unoptimized, untuned good algorithm will often beat a well-optimized, well-tuned bad (or at least worse) algorithm. Often it is not so much a matter of one or another algorithm being good or bad, but rather a matter of it being appropriate for the range of inputs and resource constraints associated with a given problem. Not all code is time-critical, and sometimes a bubble sort will serve as well as a quick sort; sometimes a radix sort works best.
In general, the rules that govern algorithm performance on most computers apply similarly on Itanium systems, and the first pass at an efficient algorithm on an Itanium system may look about the same as on any other system. But as with any other computing environment, rethinking or reconsidering the basic algorithm is a fundamental part of optimization.
Many optimization problems boil down to making key inner loops run their fastest. The following loop contains some loop-invariant code:
for (n = 0; n < 1000; n++) { multiplier = a * b + c; ...