The Software Optimization Cookbook: High-Performance Recipes for IA-32 Platforms, Second Edition

The previous chapter mentioned automatic vectorization as one of the four ways to exploit SIMD technology. This chapter gives you a detailed explanation of how to use automatic vectorization in the Intel C++ and Fortran compilers effectively with a minimum of engineering effort. Readers who are interested in the compiler methodology behind automatic vectorization can read The Software Vectorization Handbook: Applying Multimedia Extensions for Maximum Performance (Bik 2004).
This section summarizes compiler switches commonly used in the context of vectorization for the Streaming SIMD Extensions: SSE, SSE2, and SSE3. Since this summary is by no means exhaustive, please refer to the Intel compiler documentation, listed in "References," for a complete list.
For Windows, you invoke the Intel C++ compiler for IA-32 and Intel EM64T from the command line, as follows.
=> icl [switches] source.c
Similarly, you invoke the Intel Fortran compiler as shown below.
=> ifort [switches] source.f<a name="488"></a><a name="IDX-212"></a>
For both, [switches] denotes a list of optional compiler switches. Linux uses a similar syntax with compiler names icc and ifort, respectively. Table 13.1 lists compiler switches that are specific to vectorization.
| Windows | Linux | Semantics |
|---|---|---|
| -QxK or QaxK | -xK or axK | generate code for Pentium III processor |
| -QxN or QaxN | -xN or axN | generate code for Pentium 4 processor |
| -QxB or QaxB | -xB or axB | generate code for Pentium M... |