Tru64 UNIX Troubleshooting: Diagnosing and Correcting System Problems

5.2: System Crashes

5.2 System Crashes

Most system crashes are caused by kernel panics. A panic occurs when the UNIX kernel detects a severe software or hardware error and deliberately brings the system down rather than continuing to operate in an unsafe manner. It is also possible for a system to crash without panicking. This kind of crash is almost invariably due to hardware or environmental issues, such as power or temperature problems. (In rare cases, a system may crash due to a kernel panic that doesn't leave any traces of the panic; an example of this type of problem was discussed in section 2.2.4.)

In general terms, crashes can be divided into three major classes:

  1. Kernel panics that produce crash dumps

  2. Kernel panics that don't produce crash dumps

  3. Non-panic crashes

These three classes require different troubleshooting techniques. Before getting into these, we'll discuss how Tru64 UNIX crash dumps are created.

5.2.1 Crash Dump Creation

When the kernel encounters a severe problem that causes it to panic, it first writes a panic message to the system console, the system message file, and the binary error log. The panic routine then stops all running processes and calls a kernel routine named "dumpsys" to dump the contents of physical memory to disk, specifically to one or more of the active swap devices. (The dumpsys routine can also be invoked by entering the console command "CRASH"; in this way, a forced crash dump can be created when a system is hung.) The dumpsys routine locates the...

UNLIMITED FREE
ACCESS
TO THE WORLD'S BEST IDEAS

SUBMIT
Already a GlobalSpec user? Log in.

This is embarrasing...

An error occurred while processing the form. Please try again in a few minutes.

Customize Your GlobalSpec Experience

Category: Panic Bars
Finish!
Privacy Policy

This is embarrasing...

An error occurred while processing the form. Please try again in a few minutes.