VMware on NFS datastores: sequential performance

We recently tested sequential performance of a virtual disk in an ESX virtual machine when the datastore is placed on NFS. The ESX servers have a separate VMKernel Gigabit Ethernet interface which is used exclusively for NFS. On the other side a Netapp FAS 6070 attached using 10 GigE exports a volume placed on a 2×16 disk (300 GB 10k FC) RAID DP aggregate.

The server is physically a Sun 4150 with additional GigE interfaces. The VM has been configured with 4 virtual CPUs, a 25 GB virtual disk (placed on the NFS datastore) and an OpenSuse 10.3 template.

On the virtual disk I created an XFS file system and then tested with iozone (iozone -t1 -i0 -i1 -r1m -s10g). Results are pretty good:

write:   106970 kB/s
rewrite: 108379 kB/s
read:    103768 kB/s
reread:  109418 kB/s

In this setup the single GigE network connection of the ESX host is the limit but I’m quite satisfied with these numbers. There are two knobs I used to tune performance in this benchmark:

options nfs.tcp.recvwindowsize 64240

on the filer increases writes from 75 to 100 MB/s (Netapp’s defaults seems pretty low). On the Linux VM side you can turn up readahead using /sys/block/<your virtual disk device>/queue/read_ahead_kb (I used 2048 kB instead of the default 128 kB). This will help alot with the read numbers.

I’m really looking forward to Neterions X3100 10 GigE cards - we’ll put a couple of them into our VMWare servers and then the current bottleneck (GigE interface) will go away. The simplicity of VMware with NFS datastores is really amazing (particularly for larger numbers of VMware servers) and I didn’t even tell you about Netapps ASIS deduplication yet :-)

Edit: there is also a post on NFS performance inside a virtual machine

What numbers do you see in your environment? Comments welcome.

Netapp’s on-board disk diagnostics

If you ever wondered what Netapp learned from studying causes of disk failures here is how current filers handle disk problems. Last week a disk failed on one of our storage systems and I pasted a couple of line from the log files to show what happened.

Continue reading →