Entries Tagged 'NFS' ↓

VMware on NFS timeout tuning

So - you’re using NFS for VMware ESX? Nice. But what happens if your NFS server goes down or the network connection between ESX and the NFS server breaks? Failure handling in VMware is quite different from “pure” NFS. 

An NFS mount using the “hard” option on a UNIX host will block forever until the NFS server is up again or the network connection is re-estabilished. Even hours later it will continue exactly at the point where the problem began. At least in theory because any kind of server will cumulate blocked processes and the load will rise util the machine is dead. But in practice a maildir-based mail server with the mailstore on NFS can easily tolerate 2-3 minutes of NFS downtime even if the load spikes to 300-400.

What is different in VMware ESX? ESX virtual machines are not aware of the NFS filesystem underneath. ESX acccurately emulates the behaviour of a “scsi device” and if NFS is gone all I/O requests in affected virtual machines will stall.

Problems with stalled I/O requests can also pop up in real hardware so SCSI device drivers include a timeout mechanism which aborts an I/O operation. This is also true in ESX virtual machines - many Linux distributions (e.g. SLES) have a default timeout value of 60s. If the NFS filesystem is not back within a minute you’ll experience I/O-errors due to timeouts inside each virtual machine.

Why is this bad?

Usually it means running applications are interrupted in a way which they do not really expect: you may be forced to reboot the virtual machine or even perform additional recovery steps like filesystem checks. Multiply this by the number of VMs affected and you’ll see a worst case scenario and hours on the telephone with your customers.

Redundant storage and redundant network connections are used to adress this kind of problem (which can of couse also happen with FC or iSCSI storage). Still it might be necessary to fine-tune timeout values as switching between redundant component often takes some time.

For example a typical Netapp filer (lets say a FAS 3050 cluster with 168 disks and OnTAP 7.2.2) needs approximately 20s for a takeover and around 40s for a giveback - for example during a “rolling” software upgrade. In this case the default timeout of 60s is sufficient but there are failures which can cause NFS to narrowly miss the deadline. Netapp recommends to set the SCSI timeout inside virtual machines to 190s.

Timeout-Tuning on Linux … 

In the 2.6 kernel udev-rules (usually in /etc/udev/rules.d) can be used to automatically set /sys/block/<device>/timeout to 190.

… and Windows

Set the registry key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Disk\TimeOutValue to 190 (decimal).

If you can get into NOW check knowledgebase article 37986 for more information.

 

Your personal experiences with timeout tuning? Comments welcome (moderated).