Programming Itanium-based Systems: Developing High Performance Applications for Intel's New Architecture

By Rick Booth

Memory Matters

Up to this point in this chapter, it has been largely taken as a matter of faith that most memory loads can be satisfied by the L1 (Level 1) data cache in two cycles. This is by no means the rule in practice. It is important to keep in mind not only the latencies of the various levels of cache, but their respective sizes, as shown in Table 11.3.

Table 11.3: Cache Memory Summary
Cache level	Size	Integer Latency	FPLatency
L1 instruc.	16 KB
L1 data	16 KB	2
L2	96 KB	6	9
L3	4 MB	22	24
main memory	any	176+	178+

At any given time, there is only 16 K of the highest speed data cache memory available. It is organized in cache lines of 32 bytes each. It is generally useful to try to organize data in such a way that if you reference one value in a cache line, you are likely to reference other data in the same cache line again in the near future. To the extent that data references are spread out somewhat randomly through memory, cache can be defeated.

Speculative loads allow the compiler to request data far enough in advance of need, usually, to mask even the two-cycle latency of the L1 cache. Similarly, in frequently executed loops using register rotation, even slow nine-cycle floating-point loads (the fastest available), can be masked so that they don t slow down computation.

When the compiler senses (as...

< Previous Excerpt Next Excerpt >

Programming Itanium-based Systems: Developing High Performance Applications for Intel's New Architecture

TABLE OF CONTENTS

Memory Matters

Contact Preferences

This is embarrasing...

Customize Your GlobalSpec Experience

Select Your Free Newsletters

Industry Newsletters

Select Your Free Product Alerts

This is embarrasing...