System-on-Chip Test Architectures: Nanometer Design for Testability

Soft errors are transient single-event upsets (SEUs) caused by various types of radiation. Cosmic radiation has long been regarded as the major source of soft errors, especially in memories [May 1979], and chips used in space applications typically use parity or error-correcting code (ECC) for soft error protection. As circuit features begin to shrink into the nanometer ranges, error-causing activation energies are reduced. As a result, terrestrial radiation, such as alpha particles from the packaging materials of a chip, is also beginning to cause soft errors more frequently. This has created reliability concerns, especially for microprocessors, network processors, high-end routers, and network storage components.
In this section, we first illustrate the sources of soft errors and the soft error rate (SER) trends. Following a discussion of general fault tolerance schemes for soft error protection, we then discuss DIVA [Austin 1999] and Razor [Ernst 2003] [Ernst 2004], two representative error-resilient processor microarchitectures, as well as three soft error mitigation methods through built-in soft-error resilience (BISER) [Mitra 2005] and circuit-level modifications [Almukhaizim 2006] [Zhou 2006]. DIVA and Razor are mainly used for high-performance processor designs. BISER and circuit-level modification methods, however, are applicable to any design for soft error protection.
Soft errors are the result of transients that are induced in the circuit when a radiation particle strikes. This radiation can range from cosmic origin (when stars are formed and die) or from...