Netapp filesystem deduplication and VMware, Part 3

What happened before: Part 1 of this series presented a technology overview of deduplication, part 2 showed a hands-on-example using a Netapp filer and VMware datastores.

(Not so) Thin provisioning

When using NFS your virtual disks in ESX are thin provisioned - that means that if you create a 50 GB virtual disk it will not use the whole 50 GB at once. Instead a sparse file is produced which grows when you write data to the virtual disk.

Unfortunately the “thin” property is lost when you VMotion your disk to another datastore or create a new virtual machine from a template in the datastore. In this case your “thin” 50 GB will become 50 GB immediately. 

Deduplication helps

Filesystem deduplication will help here: the filer will write the 50 GB virtual disk normally but deduplicate it at night. The trick here is to schedule your snapshots right as any un-deduplicated data will remain frozen until the snapshots is deleted.

Rule: deduplicate first - do backup snapshots later!

Of course this depends on your data protection strategy - we usually take one consistent snapshot (VMware snapshot, then Netapp snapshot) per datastore in the morning so deduplication can run around midnight. An example schedule for a datastore looks like this:

  • Deduplication daily @ 2 am
  • Consistent VMware/Netapp snapshots daily @ 4 am, keep 7
  • Normal Netapp snapshots: every 4 hours, keep 6

If you do it the other way round your deduplication savings will be smaller as you’ll keep around un-deduplicated data captured inside your long-lasting backup snapshots.

Update: here is how to estimate potential savings.

What are your best tricks here? Comments welcome (moderated).

5 comments ↓

#1 Jon on 05.08.08 at 19:10

You say that If you clone a VM you lose the thin prov. In order to solve this you say that “Filesystem deduplication will help”. It’s right for cloning, but what if you have done sVMotion to another array/storage system? The VM will be in other storage array in thick format, dedupe will no help. Am I wrong? May be I just dont’s understand the thing…

#2 Christoph on 05.08.08 at 19:19

Deduplication helps if you use storage VMotion to migrate to another datastore *on a deduplicated Netapp volume*. We use different datastores for testing, staging and production (reasons: different QoS settings possible, production snapshots are not filled up with test data and so on…)

Of course from VMware point of view the migrated VM is “thick” but on the layer below ASIS deduplicates the redundancies.

If you migrate to another kind of NFS datastore or a standard VMFS volume there will be of course no savings at all. So yes, any savings are tied to Netapp storage.

#3 Jon on 05.09.08 at 9:12

We could say then ?
a) if you CLONE the VM in the same vol, the deduplication trick will work very nice(source is near equal to target).
b) if you Use StorageMotion to migrate to another datastore (on a deduplicated Netapp volume), the deduplication trick will probabily work well(if there are other VMs that contains similar info).
c) if you Use a NetApp filer with snapRestore you can use it to restore the vmdk and -flat.vmdk, edit the vmdk to reference the new -flat.vmdk and add this disks to a new VM created without disks.
d) In all cases(cloning or migrating to the same/different datastore) you can use vmkfstools -i -d thin options to clone, then delete the old files and rename the cloned thin vmdks to the original names…

Congratilations for your blog!

#4 Christoph on 05.09.08 at 10:32

An additional remark: we have separate admins for VMware and storage so the nice thing about deduplication on filesystem level is that the VMware admins do not need to care about the storage or “special” commands to keep things efficient. They just work “as usual” (e.g. using the VMware GUI) and we still save lots of space :-)

#5 Jon on 05.13.08 at 12:52

That is true, it has it’s benefits but it has it’s drawbacks too, depending on the environment.
VMware admins are provisioning VMs regarding the space left in the volumns, Storage admins apply dedupliaction and thin provisioning and it’s changes are inmmediatly presented to the VMware admins who continue provisioning VMs becouse there are space. Finally, the real occupation of the VM depends on the Server Admins, Application admins or users.
So we are “overprovisioning VMs in an underprovisioned storage”. And WE (storage, vmware or server admins) dont’s control the groth of the VMs…

I think that NFS+VMware has more benefits than drawbacks but in a production environment we have to keep an eye in the whole thing.

You must log in to post a comment.