Tru64 UNIX Troubleshooting: Diagnosing and Correcting System Problems

When you have eliminated the impossible, whatever remains, however improbable, must be the truth.
Sherlock Holmes
Every computer system, no matter how reliable, well-designed, or well-managed, will inevitably encounter problems. When a problem occurs, it is up to the system administrator to identify and correct the problem and then, in most cases, to provide explanations to users or management. This chapter provides a basic set of troubleshooting principles and techniques that will apply to any problem situation. These principles and techniques lay a foundation for the Tru64 UNIX troubleshooting specifics in the remainder of the book.
For the most part, troubleshooting computer problems has never been a well-defined subject. College classes and training courses provide a great deal of information on the theory and operation of computers, but they tend to focus on what happens when the system works, not when it breaks. Some computer manuals provide troubleshooting information, but it's usually of a specific nature, focusing on the responses to specific errors. Such specific information is useful and necessary (in fact, it forms the basis for the latter chapters in this book), but it doesn't address the process of troubleshooting.
In our experience as technical support providers, we've found that troubleshooting ability seems to be independent of academic achievement, computer skills, or experience as a system administrator. One of the interesting things about training new support personnel is finding that some people just seem to get it. They have an intuitive knack for getting to...