Programming Itanium-based Systems: Developing High Performance Applications for Intel's New Architecture

Instruction Latencies

In order to create or verify optimal assembly language code on the Itanium processor, it is important to understand the role of instruction latencies. The latency of an instruction is the length of time that must elapse from the time that the instruction is issued to the time that its results can be used by another instruction. For most simple integer math operations, like add r32=r33,r34, the latency is a single cycle, so it is possible to use the results of many operations in the very next set of parallel instructions. This is not generally true, though, for floating-point operations or loads from memory, and there is an interesting exception, also, in the case of integer compare operations.

As previously described, up to six instructions can dispatch in parallel on the Itanium processor, but if any of the source operands of any of those six instructions has not completed its latency waiting period, all six instructions will be held up until the latency wait has completed. It is therefore very important, in well-planned code, to organize instructions in such a way that they won t have to wait for source registers to be ready.

When the result of an operation is ready to be used on the very next cycle, it is said to exhibit 1-cycle latency. In similar terms, Table 11.1 shows the latencies of some of the more important assembly language instructions.

Table 11.1: Some Key Instruction Latencies

Floating Point

Cycles

multiply-and-add (fma)

5

convert integer...

UNLIMITED FREE
ACCESS
TO THE WORLD'S BEST IDEAS

SUBMIT
Already a GlobalSpec user? Log in.

This is embarrasing...

An error occurred while processing the form. Please try again in a few minutes.

Customize Your GlobalSpec Experience

Category: Programmable Logic Controllers (PLC)
Finish!
Privacy Policy

This is embarrasing...

An error occurred while processing the form. Please try again in a few minutes.