Processor Design: System-On-Chip Computing for ASICs and FPGAs

Pipelined processors use hardware load-use interlocks to prevent instructions from executing until the required operands become available. Operands can be fetched from a register or register file, they can be fetched from memory, or they can be generated as results by earlier instructions. (For RISC processors, which have load/store architectures, memory operands are only associated with load and store instructions.) For operands located in the processor s register file, data forwarding and bypassing within the pipeline can avoid the hazards that might invoke a pipeline interlock. However, memory loads typically take too long to provide data to the instruction immediately following a load instruction because of the relatively long latency of memory-read cycles compared to register-read latency.
Data-hazard problems can be solved either by stalling the pipeline using a hardware load-use interlock or in software, using the compiler s instruction scheduler to schedule only instructions that do not need the result of the load operation to immediately follow that load instruction. If no such instruction is available in the existing compiled code, the scheduler inserts a NOP, which is guaranteed not to need any data at all.
Data dependencies inherent in all programs limit the amount of instruction reordering a code scheduler can perform. For single-instruction-issue processors, a scheduler inserts independent instructions after multi-cycle instructions (such as loads) to reduce pipeline interlocks. For multiple-instruction-issue processors, a scheduler must identify independent instructions that can be concurrently executed in addition to using instruction scheduling to reduce or...