TruCluster Server Handbook

Part XI: Appendix

Appendix A: TruCluster Server Troubleshooting
Appendix B: Resources

Here, we take a look at problems we've seen and offer advice on what to do if you see them.

A.1. Troubleshooting

A.1.1 System/Cluster is Hung (or Appears Hung)

  • Are there CNX messages indicating lost quorum? (Check /var/adm/messages.)

    If so, follow Chapter 17 suggestions for restoring quorum.

  • Do the members respond to ping(8)?

    If not, the member(s) may be hung or suspended: force a crash (see section A.1.2).

  • Try logging in as root at the console.

    If you can, check resources (memory, CPU, I/O, CFS) to see if something is swamped. Also check event logs and the console log.

A.1.2 System/Cluster is Hung II

If a single member is truly hung (doesn't respond to ping, you can't login at the console, no interactive processes are responding, etc.), you probably need to force a crash on the hung member (it could also be affecting the performance or responsiveness of the rest of the cluster). To do this:

  1. Use the dumpsys(8) command on each responding member to copy a snapshot of memory to a dump file. By default, the dumpsys command writes the dump to /var/adm/crash, which is a CDSL to /cluster/members/{memb}/adm/crash.

    <b class="bold"># dumpsys</b>Saving /var/adm/crash/vmzcore.0<a name="1754"></a><a name="page800"></a>
  2. Use clu_quorum to make sure the cluster will not lose quorum when you halt the hung member. (Reference Chapter 17.)

  3. Crash the hung member by manually halting the member and running " crash" at the console prompt.

    <b class="bold">>>>...

UNLIMITED FREE
ACCESS
TO THE WORLD'S BEST IDEAS

SUBMIT
Already a GlobalSpec user? Log in.

This is embarrasing...

An error occurred while processing the form. Please try again in a few minutes.

Customize Your GlobalSpec Experience

Category: Scales and Balances
Finish!
Privacy Policy

This is embarrasing...

An error occurred while processing the form. Please try again in a few minutes.