LUN connectivity issues with Storage vMotion on EMC VNX when using VAAI

The last few weeks I have been working on some serious issues in an environment where we used vSphere 5 with an EMC VNX storage array. All seemed to run fine, but whenever we started a Storage vMotion, we noticed all kinds of strange errors we where not expecting at all.

We saw messages regarding write-quiesced VMFS volumes, we lost paths. and in some cases, the Storage vMotions did not complete at all.

During these Storage vMotions we noticed datastore latency peaked at more then 5 seconds on the source and destination LUN’s.

Unexplained LUN trespasses on EMC VNX explained …

Recently I saw some unexplained LUN trespasses on an EMC VNX that is used in a vSphere 5 environment where we use VAAI.

Since we use pools on the VNX, it is advised to keep a LUN on the owning SP, to prevent unnecessary traffic over the internal bus between SPA and SPB. EMC says:

Avoid trespassing pool LUNs. Trespassing the pool LUNs to another SP may adversely affect performance. After a pool LUN trespass, a pool LUNs private information remains under control of the original owning SP. This will cause the trespassed LUNs I/Os to continue to be handled by the original owning SP. When this happens both SPs being used in handling the I/Os. Involving both SPs in an I/O increases the time used to complete an I/O.

Syslog stops working after upgrade to ESXi 5.0 Update 1

After upgrading an environment from ESX5 to ESX5U1, I noticed syslog stopped working. Since ESXi by default does not keep log messages across reboots, it is a must to either specify a syslog server to collect the log files, or set the log directory to a shared datastore.

In this environment I used syslog to log all messages from the ESX hosts to a central syslog server.

When checking the syslog server after the upgrade, I noticed there where no more log messages arriving at the syslog server, and confirmed there used to be log messages just prior to the upgrade. Continue reading