Issue with jumbo frames after upgrading nested ESXi servers in the lab to 5.5 and fix

IMPORTANT UPDATE AT THE END OF THE ARTICLE

In my lab I use to test and play with numerous VMware solutions, I have several nested ESXi servers running. Nested ESXi servers are ESXi servers running as a VM. This is a not supported option, but it does help me to test and play around with software without having to rebuild my physical lab environment all the time.

So first a little on the setup of my nested ESXi servers

The VM’s for my nested ESXI servers have 4 NIC’s

The first NIC connects to “vESXi Trunk” This is a port group on my physical ESXi hosts that is configured on a vDS with VLAN type “VLAN Trunking” so I get all VLAN’s in my nested ESXi host:

Screen Shot 2013-11-25 at 20.49.39

I use this VLAN trunk to present my management network and my VM networks to my nested ESXi servers

I also have a NIC that connects to my vMotion network, and two nice that connect to my iSCSI networks. I use two subnets and two VLAN’s for my iSCSI connections.

Screen Shot 2013-11-25 at 20.54.20

In my physical setup I use jumbo frames in these networks, and I did the same in my nested ESXi hosts, and it worked perfectly … Until I upgraded my nested ESXi hosts to vSphere 5.5 … Continue reading

Connectivity issue when upgrading Dell R620 to ESXi 5.1 build 914609

When building a couple of new ESX hosts based on Dell R620 systems, I used the Dell customized iso VMware-VMvisor-Installer-5.1.0-799733.x86_64-Dell_Customized_RecoveryCD_A01.iso to install ESXi

Those Dell systems had 4 Broadcom nics (2 x 1Gb + 2 x 10Gb) and 2 Intel 10Gb nics

Install went fine, and I decided to upgrade to the latest patches using esxcli since the hosts had no access to vCenter. All went fine till after the reboot. I noticed all Broadcom nics where missing from my hosts, most likely due to a driver issue, so time to investigate. Continue reading

Serious performance impact on high IO VM with multiple snapshots

Recently I ran in to a situation where a customer suffered severe performance issues on a virtualized SQL server. In the SQL server we noticed a high CPU utilization, but the underlying ESX hosts only showed relatively low CPU utilization for this VM.

Debugging the VM performance issue with esxtop showed very high co-stop (%CTSP) vallues.

According to the vSphere Monitoring and Performance guide, %CTSP is

Percentage of time a resource pool spends in a ready, co-deschedule state.
NOTE You might see this statistic displayed, but it is intended for VMware use only.

Funny how VMware expresses this metric is only to be used by VMware 🙂 Continue reading

vCenter 5.1 upgrade removes permissions in vCenter in non AD environment

While upgrading vCenter to 5.1 in an environment where we used local authentication on the vCenter server, we were in for a little surprise.

The original vCenter server had a lot of custom roles and user permissions defind, on all kinds of objects in vCenter.

When we did the upgrade, we decided to install the SSO server on a separate server, and when we did the vCenter upgrade and it was registered with the SSO server, we suddenly received a message that users and groups where not found on the SSO server, which kind of made sense, since even though we recreated the users and groups on the SSO server, they had different security IDs. But what we did not expect, is the upgrade process decided to remove all non existing users and groups from the vCenter database, effectively removing all permissions from vCenter … Continue reading

ESX hosts not registering on EMC VNX (and fix)

While working on an upgrade to vSphere 5.0U1 on a Cisco UCS environment, where the ESX hosts boot from SAN, I noticed one of the hosts was not registered correctly on the EMC VNX, as it showed up as unmanaged. Because the ESX hosts boot from SAN, the host has to be registered before it can auto register, and when it was registered manually  the host was not able to update the registration. Continue reading

Error 29107 when upgrading to vCenter 5.1 (and fix)

When I tried to upgrade my vCenter 5.0U1 Server to 5.1, all seemed to go well, up until the the moment vCenter tried to register with SSO.

I received an error message “Error 29107. The service or solution user is already registered. Check Vm_ssoreg.log in system temporary folder for details”

I checked this log, but it did not really point me in to the right direction.

Then I found a post in the 5.1 beta archive that said the unique identifier for a service to register with SSO is the Common Name from its certificate. Continue reading

Issue with vShield Edge devices due to full root filesystem

In a vSphere environment I am working on we use VMware vShield Edge to do firewalling, NAT and terminate VPNs for customers.

On several occasions we where not able to make config changes to some of our VSE devices when we tried to publish the changes we made from within vShield Manager. Whenever we tried to publish the changes, we received an error message in vShield Manager it could not reach the vShield Edge device we where trying to configure.

Next to that, we noticed a lot off errors in the vShield Manager System Events tab for this specific Edge Device regarding “Multiple heartbeats missed from appliance”

An other thing we noticed was the VMware Tools for this specific VSE device did not seem to be running.

We decided to open a case at VMware and where told this is a know issue with the version of vShield we are running (5.0.1) and this will be fixed in a future version. (It is not fixed in version 5.0.2 that was released recently) Continue reading

Update on LUN connectivity issues with Storage vMotion on EMC VNX when using VAAI

A while ago I posted an article on LUN connectivity issues with Storage vMotion on EMC VNX when using VAAI we experienced.

Today I did received an e-mail from EMC they are able to reproduce our issues in their lab, which is an important step to get these issues resolved, since we can only do limited tests in our production environment. Great news to start the weekend. Will update again when I get more details on this.

Challenges when upgrading environment with EMC CX4 to vSphere 5 and mixed CX4/VNX environment

Yesterday my good friend Gabrie van Zanten from  Gabes Virtual World asked the following question on twitter:

My first reaction was “Why would Gabe want to disable VAAI on a per array basis isn the first place?” so I asked.

His answer was pretty simple and straight forward. He was working on an environment where ESX5 hosts had both EMC CX4s and VNXes connected, and VAAI was not supported on vSphere 5 for CX4, so he had to disable VAAI for the CX4’s and wanted to leave it on for the VNXes. Continue reading

Update Manager broken after import Cisco VEM extension

Today I was working on upgrading some hosts in a vSphere 5 environment that is using Cisco Nexus 1000V virtual switches. I imported the extension bundle in Update Manager, created a baseline, and scanned the hosts. After a couple of seconds , I got a message in vCenter telling me the scan failed:

Scan entity <hostname> Host cannot download files from VMware vSphere Update Manager patch store. Check the network connectivity and firewall setup, and check esxupdate logs for details. Continue reading