The Software Optimization Cookbook: High-Performance Recipes for IA-32 Platforms, Second Edition

Before the Pentium processor, floating-point operations were executed either by a separate floating-point co-processor or a floating-point emulation software package. Either way, using floating-point numbers just about guaranteed a slow application. But those days are long gone and floating-point performance is now on par with the rest of the processor and even faster in some cases. However, the same issues that affect all instructions, such as data dependences, available instruction ports, and memory latencies, also affect floating-point operations. In addition to the common problems, you should be aware of a few additional issues that are specific to floating-point operations, which include numeric exceptions, precision control, and floating-point to integer conversions.
Floating-point operations can occur using x87 floating-point unit (FPU) instructions, using packed or scalar floating-point instructions supported by the Streaming SIMD Extensions (SSE, SSE2, and SSE3), or by direct manipulation of stored floating-point numbers with integer instructions. Each method has different performance advantages, capabilities, and issues that are discussed in this chapter.
The x87 FPU and SSE/SSE2/SSE3 instructions can generate exceptions in response to certain input and calculation conditions. The processor handles exceptions by calling software handlers or, if masked, ignoring them and doing something reasonable like creating a denormal number. It is important to detect and eliminate floating-point exceptions because they usually indicate error conditions and almost always hurt performance. Table 11.1 is a list of all the possible floating-point exceptions.
| Exception | Description |
|---|---|
| Stack Overflow or Underflow | Attempt... |