I recently encountered an issue where vMotions on a host would fail, the host would disconnect from vCenter, and some other strange errors.
This was an HP host installed with the HP ISO image, but not sure if that is the cause of this issue.
When investigating the logs on the host I noticed that /var on the ramdisk was full
When issuing vdf -h available space for /var on the ramdisk was 0%
Looking in /var/log i noticed all logfiles where symlinks to /scratch except for the EMU directory, where some Emulex process seemed to fill up a log file …..
When removing the logfile /var/log/EMU/mili/mili2d.log and after restarting hostd, space was freed up on /var in the ramdisk, but the logfile /var/log/EMU/mili/mili2d.log returned and started filling up again.
Googeling I found a suggestion to remove the Emulex vibs when not using an Emulex HBA, but these hosts did have Emulex HBA’s
After some more research I found a fix that did not need a reboot or maintenance mode (which is great since vMotion stoped working on these hosts):
Had an issue at a customer where I was not able to use the in-context log viewing in vROps to view the logs for the ESXi servers in Log Insight. Using crops 6.6.1 and Log Insight 4.5.1
First part of the solution was to uses FQDN on the ESXi hosts. Only short hostnames where configured on ESXi hosts, which probably caused Log Insight not being able to match the logs it received from the ESXi hosts to the registered hosts it learned from vCenter. Because of this all hosts where missing the vmw_vr_ops_id metadata and this metadata is used by vROps to pass to Log Insight to find the logs for the correct host.
After fixing this, one host still had no vmw_vr_ops_id metadata.
Seems like for whatever reason, matching of the ESXi hostname to the name used to register the host in vCenter is case sensitive. After changing case for th hostname on the ESXi server, the match was made, the metadata was added, and the in-context log search worked … Probably a bug ….
Just after vSphere 6.5 was released I decided to upgrade my lab to 6.5. Most of the upgrade went pretty smooth, but two of my 3 distributed switches refused to upgrade. Googeling for a solution dit not help too much, probably since the product was released just a day before 🙂 When I tried to upgrade I got a message the vDS config could not be read. I also noticed I was not able to upgrade these switches to enhanced LACP.
I did find some kb articles regarding some wrong vCenter database entries for LACP in previous upgrades, so I had a feeling this was related to LACP (which I do not use) … Continue reading →
When I tried to upgrade my lab environment from vCenter 6.0 with external PSC to vCenter 5.5, I ran in to an annoying issue. I tried to upgrade my PSC, but the installer was not able to determine the version from my current PSC. It assumed it was 5.5 and I had to confirm this, which of course, I did not. No way to tell it it was really 6.0 …
When vSphere 6 was released, I decided to delete my RTM version of my external Platform service Controller and vCenter Server appliances, to replace them by the GA versions.
Installing the PSC went fine, but when installing the vCenter appliance, I was not able to register it to the PSC. I kept getting the message “Invalid credentials” every time I entered the SSO administrator password. Redeployed the PSC several time, using different passwords, but no luck registering the VCSA.
Recently I got a call from a customer he was not able to log in to his ESX 5.5 hosts anymore trough ssh, and could not vMotion VM’s anymore. It seemed like the ssh daemon died and trying to start it again did not work.
I was able to log on to one of the hosts (DL380 G8) and have a look at the vmkernel.log file.
In the log file I saw a line that read: WARNING: Heap: 3058: Heap_Align(globalCartel-1, 136/136 bytes, 8 align) failed. caller: 0x41802a2ca2fd
Google brought me to VMware KB article 2085618 with the title “ESXi host cannot initiate vMotion or enable services and reports the error: Heap globalCartel-1 already at its maximum size.Cannot expand” which sounded exactly like our problem, and seems to be caused by a memory leak in the hp-ams service.
After reading the blog article by Erik Bussink on how to use the VSAN Observer software on a Windows vCenter Server, I quickly got annoyed by manually having to start the Ruby rvc script and the VSAN Observer every time
I created a little batch file that can be put on your desktop to launch the VSAN Observer
Just enter your credentials, vCenter server Name, Datacenter Name and Cluster Name and make sure the path to the rvc directory is correct, and of you go
After you started VSAN Observer, just connect to it via port 8010 on your vCenter Server, or add an exception for port 8010 in your Windows firewall to access VSAN Observer remotely, like described in Eriks’ blog