Had an issue at a customer where I was not able to use the in-context log viewing in vROps to view the logs for the ESXi servers in Log Insight. Using crops 6.6.1 and Log Insight 4.5.1
First part of the solution was to uses FQDN on the ESXi hosts. Only short hostnames where configured on ESXi hosts, which probably caused Log Insight not being able to match the logs it received from the ESXi hosts to the registered hosts it learned from vCenter. Because of this all hosts where missing the vmw_vr_ops_id metadata and this metadata is used by vROps to pass to Log Insight to find the logs for the correct host.
After fixing this, one host still had no vmw_vr_ops_id metadata.
Seems like for whatever reason, matching of the ESXi hostname to the name used to register the host in vCenter is case sensitive. After changing case for th hostname on the ESXi server, the match was made, the metadata was added, and the in-context log search worked … Probably a bug ….
Just after vSphere 6.5 was released I decided to upgrade my lab to 6.5. Most of the upgrade went pretty smooth, but two of my 3 distributed switches refused to upgrade. Googeling for a solution dit not help too much, probably since the product was released just a day before 🙂 When I tried to upgrade I got a message the vDS config could not be read. I also noticed I was not able to upgrade these switches to enhanced LACP.
I did find some kb articles regarding some wrong vCenter database entries for LACP in previous upgrades, so I had a feeling this was related to LACP (which I do not use) … Continue reading
When I tried to upgrade my lab environment from vCenter 6.0 with external PSC to vCenter 5.5, I ran in to an annoying issue. I tried to upgrade my PSC, but the installer was not able to determine the version from my current PSC. It assumed it was 5.5 and I had to confirm this, which of course, I did not. No way to tell it it was really 6.0 …
Recently I got a call from a customer he was not able to log in to his ESX 5.5 hosts anymore trough ssh, and could not vMotion VM’s anymore. It seemed like the ssh daemon died and trying to start it again did not work.
I was able to log on to one of the hosts (DL380 G8) and have a look at the vmkernel.log file.
In the log file I saw a line that read:
WARNING: Heap: 3058: Heap_Align(globalCartel-1, 136/136 bytes, 8 align) failed. caller: 0x41802a2ca2fd
Google brought me to VMware KB article 2085618 with the title “ESXi host cannot initiate vMotion or enable services and reports the error: Heap globalCartel-1 already at its maximum size.Cannot expand” which sounded exactly like our problem, and seems to be caused by a memory leak in the hp-ams service.
And that’s where the fun started ….
IMPORTANT UPDATE AT THE END OF THE ARTICLE
In my lab I use to test and play with numerous VMware solutions, I have several nested ESXi servers running. Nested ESXi servers are ESXi servers running as a VM. This is a not supported option, but it does help me to test and play around with software without having to rebuild my physical lab environment all the time.
So first a little on the setup of my nested ESXi servers
The VM’s for my nested ESXI servers have 4 NIC’s
The first NIC connects to “vESXi Trunk” This is a port group on my physical ESXi hosts that is configured on a vDS with VLAN type “VLAN Trunking” so I get all VLAN’s in my nested ESXi host:
I use this VLAN trunk to present my management network and my VM networks to my nested ESXi servers
I also have a NIC that connects to my vMotion network, and two nice that connect to my iSCSI networks. I use two subnets and two VLAN’s for my iSCSI connections.
In my physical setup I use jumbo frames in these networks, and I did the same in my nested ESXi hosts, and it worked perfectly … Until I upgraded my nested ESXi hosts to vSphere 5.5 … Continue reading