Server Architectures: Multiprocessors, Clusters, Parallel Systems, Web Servers, and Storage Solutions

A system's performance can be expressed in two related dimensions:
Response time
Throughput
From the user's perspective, the response time is the time between issuing a request and the reception of the corresponding response. Many factors, from the network to the hardware components involved, affect response time.
Again, from the user's perspective, system throughput is the number of requests that the system usefully services per unit time.
The two dimensions are not independent: when request rate is increased, response time will also lengthen. This happens because the requests have to contend for the same system resources. This is illustrated in Figure 7.1.
The data for this graph was obtained by increasing the number of requests per second until the system saturated.
"The system illustrated here is just an example of a Web server on which the rate of incoming requests has been varied up to the saturation point. As we shall see in this chapter, every system has a bottleneck the component whose throughput limits the systems capabilities. As can be seen from the curve, which is typical in shape, approaching the system saturation point quickly drives response time to unacceptable levels.
System users are very sensitive to response time, not only in its absolute value but also in its variability. Large variations in response time for similar requests are a source of deep annoyance for users. In a transactional system, the goal is to provide essentially uniform response times.
To...