Entries from April 2008 ↓

An introduction to Ontap GX

If you never heard of Ontap GX you might want to first read Netapp’s Technical Report 3468 and have a look at Mike Eislers excellent presentation and paper from FAST ‘07 conference. You can find them on Mike personal website. There are also additional answer to questions from the session on the blog.

In an nutshell Ontap GX combines multiple file servers into a single, clustered system with a single name space. This is very similar to mounting a volume in UNIX but in GX different volumes can reside on different servers. You can ask any server about any volume and it will either fetch your data itself or internally redirect the request to the server hosting the volume.

 

The concept itself is not really new - in fact its quite old: the Andrew File System AFS introduced this concept almost 15 years ago. AFS is still the only production-quality, secure filesystem with a real world-wide, global namespace. Unfortunately AFS relies on client software to be able to locate volumes and this software has to be installed on every client. In Ontap GX standard NFS and CIFS protocols are used an as all the magic happens inside the cluster clients are not aware of what is really happening behind the scenes.

What is the advantage of a single name space across multiple file servers? You can for example migrate volumes between physical machines without disrupting client access and this can improve the utilization of your storage. For example if you add new file servers and storage existing data can be moved to the new hardware transparently. You can also mix different filer models (add them in pairs) and disk types (FC/SATA). The goal is to be able to scale both capacity and performance without adding too much administration overhead or creating new islands of storage.

In the basic scenario a single volume still resides on a single filer. If you need larger volumes or better performance you can use a striped volume which distributes data across multiple filers. Normal and striped volumes can be mixed in the single namespace.

Of course this is not really news as other companies like Bluearc or Isilon offer similar capabilities: at Bluearc its called Cluster Name Space but here striping across multiple cluster nodes is not possible. In contrary Isilon clusters always stripe everything across multiple nodes. With Ontap GX you have both options and can use them depending on workload type and data size.

At the moment many features of Netapp’s “Ontap Classic” aka Ontap 7 are not available in GX (for example iSCSI, Quality-of-service (Flexshare) and many more) but both platforms will be integrated in the future. We have two different GX clusters on site (8 nodes and 2 nodes, 70 + 12 TB) and in future posts I’ll try to show a couple of practical hands-on examples how stuff works on GX and what is different compared to “normal” filers and AFS (yes, we also use AFS).

Of course I’d love to hear from other GX users out there - please comment :-)

 

 

 

Netapp filesystem deduplication and VMware, Part 3

What happened before: Part 1 of this series presented a technology overview of deduplication, part 2 showed a hands-on-example using a Netapp filer and VMware datastores.

(Not so) Thin provisioning

When using NFS your virtual disks in ESX are thin provisioned - that means that if you create a 50 GB virtual disk it will not use the whole 50 GB at once. Instead a sparse file is produced which grows when you write data to the virtual disk.

Unfortunately the “thin” property is lost when you VMotion your disk to another datastore or create a new virtual machine from a template in the datastore. In this case your “thin” 50 GB will become 50 GB immediately. 

Deduplication helps

Filesystem deduplication will help here: the filer will write the 50 GB virtual disk normally but deduplicate it at night. The trick here is to schedule your snapshots right as any un-deduplicated data will remain frozen until the snapshots is deleted.

Rule: deduplicate first - do backup snapshots later!

Of course this depends on your data protection strategy - we usually take one consistent snapshot (VMware snapshot, then Netapp snapshot) per datastore in the morning so deduplication can run around midnight. An example schedule for a datastore looks like this:

  • Deduplication daily @ 2 am
  • Consistent VMware/Netapp snapshots daily @ 4 am, keep 7
  • Normal Netapp snapshots: every 4 hours, keep 6

If you do it the other way round your deduplication savings will be smaller as you’ll keep around un-deduplicated data captured inside your long-lasting backup snapshots.

Update: here is how to estimate potential savings.

What are your best tricks here? Comments welcome (moderated).