The ramdisk ‘var’ is full on ESXi host and fix (without maintenance mode or reboot)

I recently encountered an issue where vMotions on a host would fail, the host would disconnect from vCenter, and some other strange errors.

This was an HP host installed with the HP ISO image, but not sure if that is the cause of this issue.

When investigating the logs on the host I noticed that /var on the ramdisk was full

When issuing vdf -h available space for /var on the ramdisk was  0%

Looking in /var/log i noticed all logfiles where symlinks to /scratch except for the EMU directory, where some Emulex process seemed to fill up a log file …..

When removing the logfile /var/log/EMU/mili/mili2d.log and after restarting hostd, space was freed up on /var in the ramdisk, but the logfile /var/log/EMU/mili/mili2d.log returned and started filling up again.

Googeling I found a suggestion to remove the Emulex vibs when not using an Emulex HBA, but these hosts did have Emulex HBA’s

After some more research I found a fix that did not need a reboot or maintenance mode (which is great since vMotion stoped working on these hosts):

Continue reading

Fixing problems with in-context log viewing from vROps to Log Insight

Had an issue at a customer where I was not able to use the in-context log viewing in vROps to view the logs for the ESXi servers in Log Insight. Using crops 6.6.1 and Log Insight 4.5.1

First part of the solution was to uses FQDN on the ESXi hosts. Only short hostnames where configured on ESXi hosts, which probably caused Log Insight not being able to match the logs it received from the ESXi hosts to the registered hosts it learned from vCenter. Because of this all hosts where missing the vmw_vr_ops_id metadata and this metadata is used by vROps to pass to Log Insight to find the logs for the correct host.

After fixing this, one host still had no vmw_vr_ops_id metadata.

Seems like for whatever reason, matching of the ESXi hostname to the name used to register the host in vCenter is case sensitive. After changing case for th hostname on the ESXi server, the match was made, the metadata was added, and the in-context log search worked … Probably a bug ….

vSphere Distributed Switch refused to upgrade to version 6.5

Just after vSphere 6.5 was released I decided to upgrade my lab to 6.5. Most of the upgrade went pretty smooth, but two of my 3 distributed switches refused to upgrade. Googeling for a solution dit not help too much, probably since the product was released just a day before 🙂 When I tried to upgrade I got a message the vDS config could not be read. I also noticed I was not able to upgrade these switches to enhanced LACP.

I did find some kb articles regarding some wrong vCenter database entries for LACP in previous upgrades, so I had a feeling this was related to LACP (which I do not use) … Continue reading

vCenter 6.5 upgrade did not recognise vSphere 6.0 Platform Service Controller version

When I tried to upgrade my lab environment from vCenter 6.0 with external PSC to vCenter 5.5, I ran in to an annoying issue. I tried to upgrade my PSC, but the installer was not able to determine the version from my current PSC. It assumed it was 5.5 and I had to confirm this, which of course, I did not. No way to tell it it was really 6.0 …

Continue reading

VMworld 2016 Second Keynote

The second keynote is usually a more tech savvy, and packed with demos (which used to be live but are replaced by recorded demos nowadays).

Keynote starts with Sanjay Poonen,General Manager of End-User Computing.

Sanjay starts talking about the transformation of education, healthcare, and low tech branches like tea estates, digitally transforming the end user. Continue reading

VMworld 2016 Opening Keynote

I was fortunate enough to attend VMworld again this year. This is my take on the Monday morning Keynote.

Pat Gelsinger, VMware’s CEO kicks of limping a little due to a broken foot.

21 Alumni Elite present, whom have attended all VMworlds (all US VMworlds to be more specific) They will all get lifetime access to VMworld for themselves and their spouses.

Let’s see if VMworld Europe Alumni Elite will get the same privileges … Continue reading

Invalid credentials message when registering vCenter Server with external Platform Service Controller in vSphere 6

When vSphere 6 was released, I decided to delete my RTM version of my external Platform service Controller and vCenter Server appliances, to replace them by the GA versions.

Installing the PSC went fine, but when installing the vCenter appliance, I was not able to register it to the PSC. I kept getting the message “Invalid credentials” every time I entered the SSO administrator password. Redeployed the PSC several time, using different passwords, but no luck registering the VCSA.

Continue reading

Nasty HP software bug hits vSphere 5.1 and 5.5 and helpful info to fix this

Recently I got a call from a customer he was not able to log in to his ESX 5.5 hosts anymore trough ssh, and could not vMotion VM’s anymore. It seemed like the ssh daemon died and trying to start it again did not work.
I was able to log on to one of the hosts (DL380 G8) and have a look at the vmkernel.log file.
In the log file I saw a line that read:
WARNING: Heap: 3058: Heap_Align(globalCartel-1, 136/136 bytes, 8 align) failed. caller: 0x41802a2ca2fd
Google brought me to VMware KB article 2085618 with the title “ESXi host cannot initiate vMotion or enable services and reports the error: Heap globalCartel-1 already at its maximum size.Cannot expand” which sounded exactly like our problem, and seems to be caused by a memory leak in the hp-ams service.

And that’s where the fun started ….

Continue reading

Add RecoverPoint 4.1 with SRM RecoverPoint SRA 2.2 fails with error “SRA command ‘discoverArrays’ failed” UPDATED

During an installation and configuration of an SRM solution for a customer based on EMC RecoverPoint 4.1 I ran in to an interesting issue.

When I tried to add the RecoverPoint Clusters on both sites using the RecoverPoint SRA 2.2 I received the following error message:

Error

Continue reading

Batch file to start VSAN Observer on Windows vCenter Server

After reading the blog article by Erik Bussink on how to use the VSAN Observer software on a Windows vCenter Server, I quickly got annoyed by manually having to start the Ruby rvc script and the VSAN Observer every time

I created a little batch file that can be put on your desktop to launch the VSAN Observer

Just enter your credentials, vCenter server Name, Datacenter Name and Cluster Name and make sure the path to the rvc directory is correct, and of you go

After you started VSAN Observer, just connect to it via port 8010 on your vCenter Server, or add an exception for port 8010 in your Windows firewall to access VSAN Observer remotely, like described in Eriks’ blog

Enjoy

VSAN.Observer.bat