Programming with Intel Extended Memory 64 Technology: Migrating Software for Optimal 64-bit Performance

IA-32 Specific Optimizations

The optimizations expressed in the previous section are implemented at a comparatively low level. However, you have an opportunity to use numerous higher-level optimizations that are important to all developers using systems with Intel EM64T. Derived from the underlying IA-32 architecture, many of these optimizations are critical to optimal performance and they should become a regular part of your programming designs and process. In general, these optimizations are geared to avoiding processor stalls, which are situations in which the processor is forced to wait for instructions or data to be retrieved from memory.

The Nature of a Processor Stall

When the processor needs a data item or an instruction that is not found in the L1 cache or execution trace cache, it examines the contents of the L2 cache. If the processor does not find the item there, the item is fetched from memory. To understand why this operation causes so much delay, it's worthwhile seeing the sequence of steps performed in retrieving from main memory. To make things simple, let's presume that the processor has had to jump to a location in memory to execute instructions that are part of a function. The instructions at this location are in none of the system caches. The processor then steps through the following sequence of steps:

  • Searches in execution trace cache and L2 cache fails.

  • Initiates a fetch to memory by sending a request on the memory bus. This step is slow because the memory bus handles all...

UNLIMITED FREE
ACCESS
TO THE WORLD'S BEST IDEAS

SUBMIT
Already a GlobalSpec user? Log in.

This is embarrasing...

An error occurred while processing the form. Please try again in a few minutes.

Customize Your GlobalSpec Experience

Category: Single Board Computers (SBC)
Finish!
Privacy Policy

This is embarrasing...

An error occurred while processing the form. Please try again in a few minutes.