From Tru64 UNIX Troubleshooting: Diagnosing and Correcting System Problems
5.3 System Hangs
System hangs are among the most frustrating problems for a system administrator. Unlike a system panic, there is little or no immediate evidence to guide troubleshooting efforts. The focus shifts to determining what the system is doing and why it is not responding in the expected manner, and then determining the appropriate course of action to resolve the problem. In most cases, it's not possible to recover from a hang cleanly or even to gather much information while the system is hung. A reboot is usually required to alleviate the hung condition. Unfortunately, this destroys the "live" evidence on the system, making troubleshooting more difficult. Some evidence can be preserved by forcing a crash dump before rebooting the system. The resulting crash dump files contain the contents of physical memory at the time of the hang, providing a snapshot of conditions at that instant.
Not all hangs are created equal. From a user's point of view, a hang can be any condition in which the system is not responding. This could be caused by an application not responding, a network connection problem, or a system that is truly hung (i.e., not executing at all). Users tend to be an impatient lot; a system or application that is actually responding very slowly may be reported to be hung. As such, the first step in troubleshooting a hang is to determine the severity of the hang, a process that is discussed in section 5.3.1.
Most hangs are not cleanly...
Products & Services
Topics of Interest
Overview Performance is your reality. Forget everything else. Harold Geneen The performance of a computer system can best be defined as the system's ability to accomplish its assigned tasks.
Appendix A: TruCluster Server Troubleshooting Appendix B: Resources Here, we take a look at problems we've seen and offer advice on what to do if you see them. A.1. Troubleshooting...
Imagine what would happen if the graphical user interface you use to remotely monitor a vast, distributed network of energy meters has stopped responding. Each second your system remains unresponsive...
Overview It does not matter how slowly you go, so long as you do not stop. Confucius One of the first principles that we introduced back in Chapter 2 was that problems vary in their severity...
5.1 Boot Failures A system that won't boot can be frustrating and difficult to troubleshoot. The troubleshooting tools that come with Tru64 UNIX require (not unreasonably) Tru64 UNIX to be up and...