Recently I got a call from a customer he was not able to log in to his ESX 5.5 hosts anymore trough ssh, and could not vMotion VM’s anymore. It seemed like the ssh daemon died and trying to start it again did not work.
I was able to log on to one of the hosts (DL380 G8) and have a look at the vmkernel.log file.
In the log file I saw a line that read:
WARNING: Heap: 3058: Heap_Align(globalCartel-1, 136/136 bytes, 8 align) failed. caller: 0x41802a2ca2fd
Google brought me to VMware KB article 2085618 with the title “ESXi host cannot initiate vMotion or enable services and reports the error: Heap globalCartel-1 already at its maximum size.Cannot expand” which sounded exactly like our problem, and seems to be caused by a memory leak in the hp-ams service.
And that’s where the fun started ….