Friday, 9 October 2015

LMHB Process Trace Files in Oracle RAC - Potential Database Terminator.

During a routine performance analysis of Exadata system I came along trace files by the names on dbname_lmhb_number.trc i.e. testdb_lmhb_129875.trc being generated in the alert log location with the following lines contained in them


LMDO (ospid: 95477) has not moved for 74 sec (123453423.14444321115)
kjfmGCR_HBCheckAll: LMD0 (ospid: 95477) has status 2
: waiting for event 'ges remote message' for 0 secs with wait_id 123.

There are repeated messages like the above with only the wait_id changing in each.

We can see that the highlighted call above is something related to heartbeat check in RAC.

What Exactly is a LMHB Process

Global Cache/Enqueue Service Heartbeat Monitor
Monitor the heartbeat of LMON, LMD, and LMSn processes
LMHB monitors LMON, LMD, and LMSn processes to ensure they are running normally without blocking or spinning.

Cause of these waits  - Identified as BUG 13718279 - Affected Versions - 11.2.0.3

Effects of This Bug - DB instance terminated due to ORA-29770 in RAC.
.
Solution

Apply patch 13718279 or set the hidden init parameter _gc_defer_time=3.
Fixed in Version 11.2.0.4

Reference - MOS DOC ID - 1440112.1


No comments:

Post a Comment