Survivable UC – Avaya Aura and Nutanix Data Protection

I wanted to share a bit of cool “value add” today, as my sales and marketing guys would call it. This is just one of the things for Avaya Aura and UC in general that a Nutanix deployment can bring to the table.

Nutanix has the concept of Protection Domains and Metro Availability that have been covered in pretty great detail by some other Nutanix bloggers. Check out detailed articles here by Andre Leibovici, and here by Magnus Andersson for in depth info and configuration on Metro Availability.

Non-redundant Applications

In an Avaya Aura environment, most machines will be protected from failure at the application level. A hot standby VM will be running to take over operation in the event of primary machine failure such as with Session Manager and Communication Manager. In the following example we see that System Manager, AES, and a number of other service don’t have a hot standby. This might be because it’s too expensive resource wise, licensing wise, or the application demands don’t call for it.

1000-user_topology

If multiple Nutanix clusters are in place, we actually have two ways to protect these VMs at the Nutanix level.

Nutanix Protection Domains

First, let’s look at Protection Domains. With a Protection Domain, we configure a NDFS (Nutanix Distributed Filesystem) level snapshot that happens at a configurable interval. This snapshot is intelligently (with deduplication) replicated to another Nutanix cluster. It’s different than a vSphere snapshot because the Virtual Machine has no knowledge that a snapshot took place and no VMDK fragmentation is required. None of the standard warnings and drawbacks of running with snapshots apply here. This is a Nutanix metadata operation that can happen almost instantly.

We pick individual VMs to be part of the Protection Domain and replicate these to one or more sites.

In the event of a failure of a site or cluster, the VM can be restored at another site, because all of the files that make up the Virtual Machine (excluding memory) are preserved on the second Nutanix cluster.

ProtectionDomain

 

Nutanix Metro Availability

But I hear you saying, “Jason that’s great, but a snapshot taken at intervals is too slow. I can’t possibly miss any transactions. My UC servers are the most important thing in my Data Center. I need my replication interval to be ZERO.” This is where Metro Availability comes in.

Metro Availability is a synchronous write operation that happens between two Nutanix clusters. The requirements are:

  1. A new Nutanix container must be created for the Metro Availability protected machines.
  2. RTT latency between clusters must be less than 5 milliseconds (about 400 kilometers)

Since this write is synchronous, all disk write activity on a Metro Availability protected VM must be completed on both the local and the remote cluster before it’s acknowledged. This means all data writes are guaranteed to be protected in real time. The real-world limitation here is that every bit of distance between clusters adds latency to writes. If your application isn’t write-heavy you may be able to hit the max RTT limit without noticing any issues. If your application does nothing but write constantly to disk, 400km may need to be re-evaluated. Most UC machines are generally not disk intensive though. Lucky you!

MetroAvailability

In the previous image we have two Nutanix clusters separated by a metro ethernet link. The standalone applications like System Manager, Utility Services, Web License Manager, and Virtual Application Manager are being protected with Metro Availability.

In the even of Data Center 1 failure, all of the redundant applications will already be running in Data Center 2. The administrator can then either manually (or through a detection script) start the non-redundant VMs using the synchronous copies residing in Data Center 2.

Summary

Avaya Aura Applications are highly resilient and often provide the ability for multiple copies of each app to run simultaneously in different locations, but not all Aura apps work this way. With Nutanix and virtualization, administrators have even more flexibility to protect the non-redundant Aura apps using Protection Domains and Metro Availability.

These features present a consumer-friendly GUI for ease of operation, and also expose APIs so the whole process can be automated into an orchestration suite. These Nutanix features can provide peace of mind and real operational survivability on what would otherwise be very bad days for UC admins. Nutanix allows you to spend more time delivering service and less time scrambling to recover.