Microprocessor Design: A Practical Guide from Design Planning to Manufacturing

Most microarchitectural improvements have focused on exploiting instruction level parallelism (ILP) within programs. The architecture defines how software should run and part of this is the expectation that programs will execute instructions one at a time. However, there are many instructions within programs that could be executed in parallel or at least overlapped. Microarchitectures that take advantage of this can provide higher performance, but to do this while providing software compatibility, the illusion of linear execution must be maintained. Pipelining provides higher performance by allowing execution of different instructions to overlap.
The earliest processors did not have sufficient transistors to support pipelining. They processed instructions serially one at a time exactly as the architecture defined. A very simple processor might break down each instruction into four steps.
Fetch. The next instruction is retrieved from memory.
Decode. The type of operation required is determined.
Execute. The operation is performed.
Write. The instruction results are stored.
All modern processors use clock signals to synchronize their operation both internally and when interacting with external components. The operation of a simple sequential processor allocating one clock cycle per instruction step would appear as shown in Fig. 5-1.
A pipelined processor improves performance by noting that separate parts of the processor are used to carry out each of instruction steps (see Fig. 5-2). With some added control logic, it is possible to begin the next instruction as soon as the last instruction has completed the...