Programming Itanium-based Systems: Developing High Performance Applications for Intel's New Architecture

Memory Matters

Up to this point in this chapter, it has been largely taken as a matter of faith that most memory loads can be satisfied by the L1 (Level 1) data cache in two cycles. This is by no means the rule in practice. It is important to keep in mind not only the latencies of the various levels of cache, but their respective sizes, as shown in Table 11.3.

Table 11.3: Cache Memory Summary

Cache level

Size

Integer Latency

FPLatency

L1 instruc.

16 KB

L1 data

16 KB

2

L2

96 KB

6

9

L3

4 MB

22

24

main memory

any

176+

178+

At any given time, there is only 16 K of the highest speed data cache memory available. It is organized in cache lines of 32 bytes each. It is generally useful to try to organize data in such a way that if you reference one value in a cache line, you are likely to reference other data in the same cache line again in the near future. To the extent that data references are spread out somewhat randomly through memory, cache can be defeated.

Speculative loads allow the compiler to request data far enough in advance of need, usually, to mask even the two-cycle latency of the L1 cache. Similarly, in frequently executed loops using register rotation, even slow nine-cycle floating-point loads (the fastest available), can be masked so that they don t slow down computation.

When the compiler senses (as...

UNLIMITED FREE
ACCESS
TO THE WORLD'S BEST IDEAS

SUBMIT
Already a GlobalSpec user? Log in.

This is embarrasing...

An error occurred while processing the form. Please try again in a few minutes.

Customize Your GlobalSpec Experience

Category: Single Board Computers (SBC)
Finish!
Privacy Policy

This is embarrasing...

An error occurred while processing the form. Please try again in a few minutes.