Microprocessor Design: A Practical Guide from Design Planning to Manufacturing

Modern processors have used pipelining and increased issue width to execute more instructions in parallel. Microarchitectures have evolved to deal with the data and control dependencies, which prevent pipelined processors from reaching their maximum theoretically performance. Reducing instruction latencies is the most straightforward way of easing dependencies. If results are available sooner, fewer independent instructions must be found to keep the processor busy while waiting. Microarchitectures can also be designed to distinguish between true dependencies, when one instruction must use the results of another, and false dependencies, which instruction reordering or better sharing of resources might eliminate. When dependencies can often but not always be eliminated, modern microarchitectures are designed to "guess" by predicting the most common behavior, spending extra time to get the correct result after a wrong guess. Some of the most important microarchitectural concepts of recent processors are:
Cache memories
Cache coherency
Branch prediction
Register renaming
Microinstructions
Reorder, replay, retire
All of these ideas seek to improve IPC and are discussed in more detail in the following sections.
Storing recently used values in cache memories to reduce average latency is an idea used over and over again in modern microarchitectures. Instructions, data, virtual memory translations, and branch addresses are all values commonly stored in caches in modern processors. Chapter 2 discussed how multiple levels of cache work together to create a memory hierarchy that has lower average latency than any single level of cache could achieve. Caches are effective at reducing average latency...