Scientific Computing on Itanium-Based Systems

The Itanium architecture was designed to allow high performance commercial servers to be built based on it. Its floating-point performance also makes it suitable for use in supercomputer-level technical workstations. However, to realize the potential of the architecture, the performance programmer or the compiler writer should understand the operation of the underlying microarchitecture. The present chapter describes from a software perspective some of the Itanium processor features that are relevant to assembly language programming, compilation, and performance tuning, and that are closely related to the processor microarchitecture. For more detailed information, see the Intel Itanium 2 Processor Reference Manual for Software Development and Optimization [7].
The Itanium and Itanium 2 processors are the result of the Explicitly Parallel Instruction Computing (EPIC) design philosophy, aimed at maximizing performance by using a close software-hardware cooperation. This is achieved through advanced features that enhance instruction level parallelism, and through massive hardware resources for parallel execution [12]. Several architecture features can be programmed by the compiler or by the assembly language programmer through branch hints, explicit parallelism, register rotation, predication, speculation, and memory hints. The hardware resources used in each case to exploit the software-hardware synergy are represented in Figure 6.1.
To translate the EPIC concepts into reality, the Itanium processor uses a 10-stage in-order pipeline [12], designed for maximum throughput. The 10 stages of the pipeline are shown in Figure 6.2. The Itanium 2 processor has an 8-stage pipeline...