Accuracy and Stability of Numerical Algorithms, Second Edition

Block algorithms are advantageous for at least two important reasons. First, they work with blocks of data having b 2 elements, performing O( b 3) Operations. The O( b) ratio of work to storage means that processing elements with an O( b) ratio of computing speed to input/output bandwidth can be tolerated. Second, these algorithms are usually rich in matrix multiplication. This is an advantage because nearly every modern parallel machine is good at matrix multiplication.
-ROBERT S. SCHREIBER, Block Algorithms for Parallel Machines (1988)
It should be realized that, with partial pivoting, any matrix has a triangular factorization. DECOMP actually works faster when zero pivots occur because they mean that the corresponding column is already in triangular form.
-GEORGE E. FORSYTHE, MICHAEL A. MALCOLM, and CLEVE B. MOLER, Computer Methods for Mathematical Computations (1977)
It was quite usual when dealing with very large matrices to perform an iterative process as follows: the original matrix would be read from cards and the reduced matrix punched without more than a single row of the original matrix being kept in store at any one time; then the output hopper of the punch would be transferred to the card reader and the iteration repeated.
-MARTIN CAMPBELL-KELLY, Programming the Pilot ACE (1981)
As we noted in Chapter 9 (Notes and References), Gaussian elimination (GE) comprises three nested loops that can be ordered in six ways, each...