The internal or ambient temperature around one or more nodes has exceeded the allowable thresholds.
Ambient temperature is only measured by front panel sensors. If you receive an event that indicates that the front panel is out of specification, the temperature in your data center might need to be adjusted.
If a node is subjected to high temperatures for an extended period of time, the CPU is throttled and the node goes into read only-mode to help prevent potential data loss due to component failure. If the node temperature reaches critical levels, it is possible that the node will shut down entirely.
Perform the following steps in the order listed. If the issue resolves after a step, there is no need to complete the subsequent steps.
- (HD400 only) Make sure that the drive drawer is properly shut by sliding it out and re-closing it firmly but carefully.
- Review the temperature statistics for the affected sensor, which are included in the event. If the temperature is consistently elevated, the problem is likely a high ambient temperature in the data center. Address any changes in the cluster environment such as air conditioning outages.
- Verify that air flow within the rack, and through the front and rear panel vents of the node, is not obstructed in any way.
- Make sure that the faceplate on the affected node is installed, properly seated, and undamaged. In some cases, removing and re-seating the faceplate will resolve this issue.
- Run the isi_hw_status command. Review the output to determine whether there is a slow or failed fan that was not otherwise reported.
- Check for high CPU and disk usage in the node. High usage can contribute to high temperatures within the node.
If the steps above were unsuccessful in clearing this event, the subsystem that monitors the health of the hardware (such as the temperature and fan speeds) might have encountered a problem. This event can occur intermittently without harm to the system and you can safely quiet the event unless the issue persists.
If the above steps do not resolve the issue, gather logs, and then contact EMC Isilon Technical Support for additional troubleshooting. For instructions, see Gathering cluster logs.