Scientific Computing on Itanium-Based Systems

This chapter describes some of the key characteristics and innovations introduced in the Itanium architecture, from the perspective of an application programmer. Instruction groups and bundles, the application registers, and the register stack are presented first. Descriptions of important Itanium architecture features follow, including predication, branching, register rotation, and modulo-scheduled loop support. Special features of the floating-point architecture will be described in Chapter 3, and memory access and speculation in Chapter 4. The reader should learn enough about the Itanium architecture and its programming from Chapters 2 through 5 to understand the remainder of this book without having to refer to other sources of information. For detailed coverage of all Itanium processor application programming aspects, including complete information on the instruction set, the Intel Itanium Architecture Software Developer's Manual [3] is the recommended source of information.
The Itanium instruction format encodes parallelism explicitly. The stream of instructions is split into successive groups separated by stop bits, denoted in assembly language by two consecutive semicolons (;;). The instructions within a group are candidates for parallel execution, and should obey the following resource interdependencies related to processor register and memory accesses:
Within an instruction group, read-after-write (RAW) and write-after-write (WAW) register dependencies are not allowed. Write-after-read (WAR) dependencies are allowed.
Within an instruction group, RAW, WAW, and WAR memory dependencies are allowed.
For example, a RAW dependency occurs when an instruction has a source operand register or memory location that was the destination of an earlier instruction...