Netapp filesystem deduplication and VMware, Part 3

What happened before: Part 1 of this series presented a technology overview of deduplication, part 2 showed a hands-on-example using a Netapp filer and VMware datastores.

(Not so) Thin provisioning

When using NFS your virtual disks in ESX are thin provisioned - that means that if you create a 50 GB virtual disk it will not use the whole 50 GB at once. Instead a sparse file is produced which grows when you write data to the virtual disk.

Unfortunately the “thin” property is lost when you VMotion your disk to another datastore or create a new virtual machine from a template in the datastore. In this case your “thin” 50 GB will become 50 GB immediately. 

Deduplication helps

Filesystem deduplication will help here: the filer will write the 50 GB virtual disk normally but deduplicate it at night. The trick here is to schedule your snapshots right as any un-deduplicated data will remain frozen until the snapshots is deleted.

Rule: deduplicate first - do backup snapshots later!

Of course this depends on your data protection strategy - we usually take one consistent snapshot (VMware snapshot, then Netapp snapshot) per datastore in the morning so deduplication can run around midnight. An example schedule for a datastore looks like this:

  • Deduplication daily @ 2 am
  • Consistent VMware/Netapp snapshots daily @ 4 am, keep 7
  • Normal Netapp snapshots: every 4 hours, keep 6

If you do it the other way round your deduplication savings will be smaller as you’ll keep around un-deduplicated data captured inside your long-lasting backup snapshots.

Update: here is how to estimate potential savings.

What are your best tricks here? Comments welcome (moderated).