Parallel Programming in OpenMP

Many typical programs in scientific and engineering application domains spend most of their time executing loops, in particular, do loops in Fortran and for loops in C. We can often reduce the execution time of such programs by exploiting loop-level parallelism, by executing iterations of the loops concurrently across multiple processors. In this chapter we focus on the issues that arise in exploiting loop-level parallelism, and how they may be addressed using OpenMP.
OpenMP provides the parallel do directive for specifying that a loop be executed in parallel. Many programs can be parallelized successfully just by applying parallel do directives to the proper loops. This style of finegrained parallelization is especially useful because it can be applied incrementally: as specific loops are found to be performance bottlenecks, they can be parallelized easily by adding directives and making small, localized changes to the source code. Hence the programmer need not rewrite the entire application just to parallelize a few performance-limiting loops. Because incremental parallelization is such an attractive technique, parallel do is one of the most important and frequently used OpenMP directives.
However, the programmer must choose carefully which loops to parallelize. The parallel version of the program generally must produce the same results as the serial version; in other words, the correctness of the program must be maintained. In addition, to maximize performance the execution time of parallelized loops should be as short as possible, and certainly not longer than...