The Software Optimization Cookbook: High-Performance Recipes for IA-32 Platforms, Second Edition

Single instruction, multiple data or SIMD technology forms an important performance extension to Intel Architecture Processors, starting with the Intel Pentium processor with MMX technology. Since then, all 32-bit Intel Architecture and Intel EM64T processors have extended SIMD technology continuously. A typical SIMD instruction achieves higher performance by operating on multiple data elements at the same time, as illustrated in Figure 12.1.
A brief history of extending SIMD technology from 8-byte packed integers in the MMX technology to 16-byte packed floating-point numbers and packed integers in the Streaming SIMD Extensions (SSE, SSE2, and SSE3) appears in Table 12.1.
| Technology | First Appeared | Description |
|---|---|---|
| MMX technology | Pentium processor with MMX technology | Introduced 8-byte packed integers. |
| SSE | Pentium III processor | Added 16-byte packed single-precision floating-point numbers. |
| SSE2 | Pentium 4 processor | Added 16-byte packed double-precision floating-point numbers and integers. |
| SSE3 | Pentium 4 processor with Hyper-Threading Technology | Added some instructions to SSE2. |
| SSE3 on Intel EM64T | Intel EM64T processors | Extended number of SIMD registers from 8 to 16. |
This chapter introduces the MMX technology and streaming SIMD extensions and shows you several ways to use SIMD technology to achieve higher performance. You can find detailed explanations of the specific instructions in the IA-32 Intel Architecture Software Developer's Manual, Volumes 1, 2 and 3, listed in "References."
The MMX technology and the Streaming SIMD Extensions exploit