Why do I recieve BUG: soft lockup kernel messages

Problem:

Error messages similar to the following appear in the kernel message log:

kernel: BUG: soft lockup - CPU#0 stuck for 10s! [events/0:50]

Details:

The CentOS/RHEL kernel has a default softlockup threshold of 10 seconds. This can sometimes be too low if the system is very busy with I/O.

We have also seen these messages on an idle system with a large core count. In this case it may be caused by a bug in the kernel when the CPU wakes up from an extended idle period.

The following Redhat knowledge base article has some additional information:

http://kbase.redhat.com/faq/docs/DOC-17358

Solution:

The problem can often be worked around by increasing the softlockup threshold to are larger value. For example:

echo 120 > /proc/sys/kernel/softlockup_thresh