Nutanix AHV Best Practices Guide

In my last blog post I talked about networking with Open vSwitch in the Nutanix hypervisor, AHV. Today I’m happy to announce the continuation of that initial post – the Nutanix AHV Best Practices Guide.

Nutanix  introduced the concept of AHV, based on the open source Linux KVM hypervisor. A new Nutanix node comes installed with AHV by default with no additional licensing required. It’s a full-featured virtualization solution that is ready to run VMs right out of the box. ESXi and Hyper-V are still great on Nutanix, but AHV should be seriously considered because it has a lot to offer, with all of KVMs rough edges rounded off.

Part of introducing a new hypervisor is describing all of the features, and then recommending some best practices for those features. In this blog post I wanted to give you a taste of the doc with some choice snippets to show you what this Best Practice Guide and AHV are all about.

Take a look at Magnus Andersson’s excellent blog post on terminology for some more detailed background on terms.

Acropolis Overview

Acropolis (one word) is the name of the overall project encompassing multiple hypervisors, the distributed storage fabric, and the app mobility fabric. The goal of the Acropolis project is to provide seamless invisible infrastructure whether your VMs exist in AWS, Hyper-V, ESXi, or the AHV. The sister project, Prism, provides the user interface to manage via GUI, CLI, or REST API.

AHV Overview

AHV is based on the open source KVM hypervisor, but is enhanced by all the other components of the Acropolis project. Conceptually, AHV has access to the Distributed Storage Fabric for storage, and the App Mobility Fabric powers the management plane for VM operations like scheduling, high availability, and live migration.

 

The same familiar Nutanix architecture exists, with a network of Controller Virtual Machines providing storage access to VMs. The CVM takes direct control of the underlying disks (SSD and HDD) with PCI passthrough, and exposes these disks to AHV via iSCSI (The blue dotted VM I/O line). The management layer is spread across all Nutanix nodes in the CVMs using the same web-scale principles of the storage layer. This means that by-default, a highly available VM management layer exists. No single point of failure anymore! No additional work to setup VM management redundancy – it just works that way.

AHV Networking Overview

Networking in AHV is provided by an Open vSwitch instance (OVS) running on each AHV host. The BPG doc has a comprehensive overview of the different components inside OVS and how they’re used. I’ll share a teaser diagram of the default network config after installation in a single AHV node.

AHV Networking Best Practices

Bridges, Bonds, and Ports – oh my. What you really want to know is “How do I plug this thing into my switches, setup my VLANs, and get the best possible load balancing. You’re in luck, because the Best Practice Guide covers the most common scenarios for creating different virtual switches and configuring load balancing.

Here’s a closer look at one possible networking configuration, where the 10gigabit adapters and 1gigabit adapters have been connected into separate OVS bridges. User VM2 has the ability to connect to multiple physically separate networks with this design to allow things like virtual firewalls.

After separating network traffic, the next thing is load balancing. Here’s a look at another possible load balancing method called active-slb. Not only does the BPG provide the configuration for this, but also the rationale. Maybe fault tolerance is important to you. Maybe active-active configuration with LACP is important. The BPG will cover the config and the best way to achieve your goals.

For information on VLAN configuration, check out the Best Practices Guide.

Other AHV Best Practices

This BPG isn’t just networking specific. The standard features you expect from a hypervisor are all covered.

  • VM Deployment
    • Leverage the fantastic aCLI, GUI, or REST API to deploy or clone VMs.
  • VM Data Protection
    • Backup up VMs with local or remote snapshots.
  • VM High Availability
    • During physical host failure, ensure that VMs are started elsewhere in the cluster.
  • Live Migration
    • Move running VMs around in the cluster.
  • CPU, Memory, and Disk Configuration
    • Add the right resources to machines as needed.
  • Resource Oversubscription
    • Rules for fitting the most VMs onto a running cluster for max efficiency.

Take a look at the AHV Best Practice Guide for information on all of these features and more. With this BPG in hand you can be up and running with AHV in your datacenter and get the most out of all the new features Nutanix has added.

Networking Exploration in Nutanix AHV

Nutanix recently released the AHV hypervisor, which means I get a new piece of technology to learn! Before I started this blog post I had no idea how Open vSwitch worked or what KVM and QEMU were all about.

Since I come from a networking background originally, I drilled down into the Open vSwitch and KVM portion of the Nutanix solution. Here’s what I learned! Remember my disclaimer – I didn’t know anything about this before I started the blog. If I’ve got something a bit wrong feel free to comment and I’m happy to update or correct.

KVM Host Configuration

AHV is built on the Linux KVM hypervisor so I figured that’s a great place to start. I read the Nutanix Bible by Steve Poitras and saw this diagram on networking.

Networking diagram inside the Acropolis KVM Host
AHV OvS Networking

The CVM has two interfaces connecting to the hypervisor. One interface plugs into the Open vSwitch and the other goes to “internal”. I wasn’t sure what that meant. Looking through the hypervisor host config though I saw the following interfaces:

[root@DRM-3060-G4-1-1 ~]# ifconfig 
br0 Link encap:Ethernet HWaddr 0C:C4:7A:58:91:50 
  inet addr:10.59.31.77 Bcast:10.59.31.255 Mask:255.255.254.0
eth0 Link encap:Ethernet HWaddr 0C:C4:7A:3B:1C:8C 
eth1 Link encap:Ethernet HWaddr 0C:C4:7A:3B:1C:8D 
eth2 Link encap:Ethernet HWaddr 0C:C4:7A:58:91:50 
eth2.32 Link encap:Ethernet HWaddr 0C:C4:7A:58:91:50 
eth3 Link encap:Ethernet HWaddr 0C:C4:7A:58:91:51 
eth3.32 Link encap:Ethernet HWaddr 0C:C4:7A:58:91:51 
lo Link encap:Local Loopback 
virbr0 Link encap:Ethernet HWaddr 52:54:00:74:F9:B0 
  inet addr:192.168.5.1 Bcast:192.168.5.255 Mask:255.255.255.0
vnet0 Link encap:Ethernet HWaddr FE:54:00:9C:D8:CD 
vnet1 Link encap:Ethernet HWaddr FE:54:00:BE:99:B3

The next place I went was routing with netstat -r to see which interfaces were used for each next hop destination.

[root@DRM-3060-G4-1-1 ~]# netstat -r 
Kernel IP routing table
Destination  Gateway Genmask       Flags MSS Window irtt Iface
192.168.5.0  *       255.255.255.0 U     0 0 0           virbr0
10.59.30.0   *       255.255.254.0 U     0 0 0           br0
link-local   *       255.255.0.0   U     0 0 0           eth0
link-local   *       255.255.0.0   U     0 0 0           eth1
link-local   *       255.255.0.0   U     0 0 0           eth2
link-local   *       255.255.0.0   U     0 0 0           eth3
link-local   *       255.255.0.0   U     0 0 0           br0
default      10.59.30.1 0.0.0.0    UG    0 0 0           br0

I omitted a lot of text just to be concise here. We can see there are two interfaces with IPs, br0 and virbr0. Let’s start with virbr0, which is that internal interface. You can tell because it’s the 192.168 private IP used for CVM to hypervisor communication. I found that it was a local linux bridge, not an Open vSwitch controlled device:

[root@DRM-3060-G4-1-1 ~]# brctl show virbr0
bridge name bridge id         STP enabled interfaces
virbr0      8000.52540074f9b0 no          virbr0-nic
                                          vnet1

This bridge virbr0 has the vnet1 interface headed up to the internal adapter of the CVM – so THIS is where the CVM internal interface terminates.

That’s one side of the story – the next part is Open vSwitch

[root@DRM-3060-G4-1-1 ~]# ovs-vsctl show
be65c814-5d7c-46ab-bfb1-7b2bea19d954
 Bridge "br0"
  Port "tap345"
    tag: 32
    Interface "tap345"
  Port "vnet0"
    Interface "vnet0"
  Port "br0"
    Interface "br0"
      type: internal
  Port "bond0"
    Interface "eth2"
    Interface "eth3"
  Port "br0-dhcp"
    Interface "br0-dhcp"
      type: vxlan
      options: {key="1", remote_ip="10.59.30.82"}
  Port "br0-arp"
    Interface "br0-arp"
      type: vxlan
      options: {key="1", remote_ip="192.168.5.2"}
ovs_version: "2.1.3"

OvS has a vSwitch called br0. The CVM vnet0 is a port on this bridge, and so is bond0 (the combination of the 10GbE interfaces). We also see this special “type:internal” interface – this one with the IP address assigned to it. This is the external facing IP of the AHV / KVM hypervisor host.

[root@DRM-3060-G4-1-1 network-scripts]# cat ifcfg-br0 
DEVICE=br0
DEVICE_TYPE=ovs
TYPE=OVSIntPort
NM_CONTROLLED=no
ONBOOT=yes
BOOTPROTO=none
IPADDR=10.59.31.77
NETMASK=255.255.254.0
GATEWAY=10.59.30.1
OVSREQUIRES="eth3 eth2 eth1 eth0"

In addition to the CVM, external, and internal interfaces we see a tap345 interface tagged in VLAN 32. This matches the tagged interfaces from our “ifconfig -a” command above: eth2.32 and eth3.32. It’ll be used for a VM that has a network interface in VLAN 32.

Finally – we come to the IP Address Management (IPAM) interfaces, br0-arp, and br-dhcp. Steve mentions VXLAN and here’s where we see those concepts. The OvS can either intercept and respond to DHCP traffic, or just let it through. If we allow OvS to intercept the traffic this means Acropolis and Prism now become the point of control for giving out IP addresses to VMs that boot up. Very cool!

Now let’s take a look at the config parameters passed to the running CVM. Right now this box has ONLY the CVM running on it so only one instance of qemu-kvm running.

[root@DRM-3060-G4-1-1 ~]# ps -ef | grep qemu
qemu 9250 1 61 Jun26 ? 1-21:13:21 /usr/libexec/qemu-kvm -name NTNX-DRM-3060-G4-1-1-CVM 
-S -enable-fips -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off,mem-merge=off -cpu host,+kvm_pv_eoi -m 24576 -realtime mlock=on -smp 8,sockets=8,cores=1,threads=1 
-uuid 1323cbbc-a20d-d66a-563e-ca7a8609cb73 -no-user-config -nodefaults 
-chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/NTNX-DRM-3060-G4-1-1-CVM.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control 
-rtc base=utc -no-shutdown -boot menu=off,strict=on 
-kernel /var/lib/libvirt/NTNX-CVM/bzImage -initrd /var/lib/libvirt/NTNX-CVM/initrd 
-append init=/svmboot quiet console=ttyS0,115200n8 
-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 
-netdev tap,fd=20,id=hostnet0,vhost=on,vhostfd=21 
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:9c:d8:cd,bus=pci.0,addr=0x3 
-netdev tap,fd=22,id=hostnet1,vhost=on,vhostfd=23 
-device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:be:99:b3,bus=pci.0,addr=0x4 
-chardev file,id=charserial0,path=/tmp/NTNX.serial.out.0 
-device isa-serial,chardev=charserial0,id=serial0 
-vnc 127.0.0.1:0 -vga cirrus 
-device pci-assign,configfd=24,host=01:00.0,id=hostdev0,bus=pci.0,addr=0x5,rombar=0 
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on

Maybe a better way to look at the CVM details is via the XML configuration:

[root@DRM-3060-G4-1-1 ~]# cat /etc/libvirt/qemu/NTNX-DRM-3060-G4-1-1-CVM.xml
<domain type='kvm'>
 <name>NTNX-DRM-3060-G4-1-1-CVM</name>
 <uuid>1323cbbc-a20d-d66a-563e-ca7a8609cb73</uuid>
 <memory unit='KiB'>25165824</memory>
 <currentMemory unit='KiB'>25165824</currentMemory>
 <memoryBacking>
 <nosharepages/>
 <locked/>
 </memoryBacking>
 <vcpu placement='static'>8</vcpu>
 <os>
 <type arch='x86_64' machine='pc-i440fx-rhel7.0.0'>hvm</type>
 <kernel>/var/lib/libvirt/NTNX-CVM/bzImage</kernel>
 <initrd>/var/lib/libvirt/NTNX-CVM/initrd</initrd>
 <cmdline>init=/svmboot quiet console=ttyS0,115200n8</cmdline>
 <boot dev='hd'/>
 <bootmenu enable='no'/>
 </os>
 <features>
 <acpi/>
 <apic eoi='on'/>
 <pae/>
 </features>
 <cpu mode='host-passthrough'>
 </cpu>
 <clock offset='utc'/>
 <on_poweroff>destroy</on_poweroff>
 <on_reboot>restart</on_reboot>
 <on_crash>restart</on_crash>
 <devices>
 <emulator>/usr/libexec/qemu-kvm</emulator>
 <controller type='usb' index='0'>
 <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
 </controller>
 <controller type='ide' index='0'>
 <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
 </controller>
 <controller type='pci' index='0' model='pci-root'/>

##Here is the CVM interface being linked to the br0 OvS interface
 <interface type='bridge'>
 <mac address='52:54:00:9c:d8:cd'/>
 <source bridge='br0'/>
 <virtualport type='openvswitch'>
 <parameters interfaceid='7a0a4887-b8cd-4f02-960d-cca5c1ca73cc'/>
 </virtualport>
 <model type='virtio'/>
 <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
 </interface>

##Here is the CVM interface going to the local link bridge.
 <interface type='network'>
 <mac address='52:54:00:be:99:b3'/>
 <source network='NTNX-Local-Network'/>
 <model type='virtio'/>
 <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
 </interface>

 <serial type='file'>
 <source path='/tmp/NTNX.serial.out.0'/>
 <target port='0'/>
 </serial>
 <console type='file'>
 <source path='/tmp/NTNX.serial.out.0'/>
 <target type='serial' port='0'/>
 </console>
 <input type='mouse' bus='ps2'/>
 <input type='keyboard' bus='ps2'/>
 <graphics type='vnc' port='-1' autoport='yes' listen='127.0.0.1'>
 <listen type='address' address='127.0.0.1'/>
 </graphics>
 <video>
 <model type='cirrus' vram='16384' heads='1'/>
 <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
 </video>
 <hostdev mode='subsystem' type='pci' managed='yes'>
 <source>
 <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
 </source>
 <rom bar='off'/>
 <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
 </hostdev>
 <memballoon model='virtio'>
 <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
 </memballoon>
 </devices>
</domain>

We saw a new reference in that last command, NTNX-Local-Network. If we look at virsh for information about defined networks we see the following:

[root@DRM-3060-G4-1-1 ~]# virsh net-list --all 
 Name               State  Autostart Persistent
----------------------------------------------------------
 NTNX-Local-Network active yes       yes
 VM-Network         active yes       yes

If we look in the /root/ partition there are definitions for these:

[root@DRM-3060-G4-1-1 ~]# cat net-NTNX-Local-Network.xml 
<network connections='1'>
 <name>NTNX-Local-Network</name>
 <bridge name='virbr0' stp='off' delay='0' />
 <ip address='192.168.5.1' netmask='255.255.255.0'>
 </ip>
</network>

[root@DRM-3060-G4-1-1 ~]# cat net-VM-Network.xml 
<network connections='1'>
 <name>VM-Network</name>
 <forward mode='bridge'/>
 <bridge name='br0' />
 <virtualport type='openvswitch'/>
 <portgroup name='VM-Network' default='yes'>
 </portgroup>
</network>

These two pieces of information tie everything together neatly for us. The internal network given to the CVM is the linux virbr0 device. The external network given to the CVM is OvS br0.

Now I think I finally understand that image presented at the beginning!

CVM Guest Configuration

Since we understand the KVM/AHV host configuration lets take a look in the CVM guest. This should be a little easier.

nutanix@NTNX-15SM60140129-A-CVM:10.59.30.77:~$ netstat -r
Kernel IP routing table
Destination Gateway Genmask         Flags MSS Window irtt Iface
192.168.5.0 *       255.255.255.128 U     0 0 0           eth1
192.168.5.0 *       255.255.255.0   U     0 0 0           eth1
10.59.30.0  *       255.255.254.0   U     0 0 0           eth0
link-local  *       255.255.0.0     U     0 0 0           eth0
link-local  *       255.255.0.0     U     0 0 0           eth1
default     10.59.30.1 0.0.0.0      UG    0 0 0           eth0

The routing table shows the internal and external networks, and just two network adapters. The eth1 adapter has a subinterface. This one is eth1:1 as opposed to .1. Not entirely sure what that one means – but I’ll keep it in mind in case I come across something later on.

nutanix@NTNX-15SM60140129-A-CVM:10.59.30.77:~$ ifconfig -a
eth0 Link encap:Ethernet HWaddr 52:54:00:9C:D8:CD 
  inet addr:10.59.30.77 Bcast:10.59.31.255 Mask:255.255.254.0

eth1 Link encap:Ethernet HWaddr 52:54:00:BE:99:B3 
  inet addr:192.168.5.2 Bcast:192.168.5.127 Mask:255.255.255.128

eth1:1 Link encap:Ethernet HWaddr 52:54:00:BE:99:B3 
  inet addr:192.168.5.254 Bcast:192.168.5.255 Mask:255.255.255.0
  UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

lo Link encap:Local Loopback 
  inet addr:127.0.0.1 Mask:255.0.0.0

That’s it – just two simple interfaces in the CVM. One for internal traffic to the hypervisor directly, another for receiving any external requests from remote CVMs, the management APIs, and all of the other magic that the CVM performs!

This concludes our walkthrough of networking inside a Nutanix AHV machine. I hope you learned as much as I did going through these items! Please comment or reach out to me directly if you have any questions.

Nutanix .NEXT Announcement – Acropolis and KVM

I’m happy to see that Nutanix has officially announced their upcoming strategic direction at the .NEXT conference. Using Nutanix Acropolis, KVM, and Prism – data center administrators now have the ability to truly make infrastructure invisible.

What Is It?

To read more about the specific details take a look at Andre Leibovici’s post here, then come back. It has great pictures and lists of features, you’ll like it.

Key advantages for me as a UC administrator:

  1. Linux KVM as a fully featured and consumer friendly hypervisor
  2. Nutanix Prism and Acropolis presenting a seamless management interface for VMs regardless of the underlying hypervisor
  3. Management interfaces designed with Nutanix web-scale principles such as distributed-everything, shared-nothing architecture in mind
  4. Simple migration of existing VMs into a Nutanix XCP (Xtreme Computing Platform) environment

I see an exciting future for enterprises that want to virtualize but don’t want to get locked into a particular hypervisor. Real choice is now available to put workloads on the hypervisor that makes most sense.

Combined with the ability to scale compute and storage effortlessly, administrators can stop worrying about infrastructure and start planning for what truly matters, Unified Communication applications 😉 I might be a little biased there, but its applications that drive business productivity, not compute and storage infrastructure.

Your compute, storage, and now even your hypervisor can be seen as a commodity that’s just available to applications.

What Does It Mean For My UC?

Test and development virtual environments can be virtualized and managed without paying for hypervisor licenses.

Production environments that support Linux KVM can be migrated with a few clicks.

Your VM management infrastructure becomes more resilient and reliable with one-click upgrades possible for BIOS, Firmware, Hypervisor, and Storage Software for the entire infrastructure stack.

Less time spent managing infrastructure and more time spent working on UC.

But I Can’t Use KVM or Don’t Want To

Nutanix still supports using VMware vSphere or Microsoft Hyper-V and the same flexible storage and compute layer is still available. Infrastructure is still invisible, but in these cases VM Management will be performed through the corresponding VMware or MS tools.

Some UC vendors such as Microsoft already support multiple Hypervisors. MS Lync (Skype for Business) is supported on any Hypervisor listed in the SVVP program, for example. In the past, Avaya supported the Aura “System Platform” which was XenServer.

I expect the UC marketplace to open up and support alternative hypervisors in the future. Customer demand can drive vendor behavior, like it did with Cisco’s support for specs based virtualization of UC.

What NEXT?

Give Nutanix a try for your environment with the free Nutanix Community Edition. See if you can save on test or development VM environments at first. Think about what happens if you can truly separate your applications from the infrastructure stack. Where is the best place for those apps to run? If you already have a Nutanix Environment, then investigate standing up a cluster with Acropolis and KVM.

If you’re at .NEXT, stop by the Avaya booth and talk with Steven Given about the work already done to verify interoperability between Avaya’s Software Defined Datacenter and Nutanix Software Defined Storage.

Shaping up with FitBit

Breaking Point

I used to say “I’d never run unless someone was chasing me.” Well I’ve finally reached that point.

I woke up one morning this past winter and realized I was going to be woefully unprepared for an upcoming snowboarding trip. My belly would throw off my balance, causing me to bounce and roll down the entire mountain. I’d be dead tired just halfway down my first run. If I screwed up enough I could actually be dead.

It could happen because my mental image of myself no longer matched up with the image in the mirror (to say nothing of the scale). Somehow I’d lost touch with my actual self. Dirt biking, mountain biking, snowboarding, all of these things were no longer in easy reach. I saw myself as a fit 20 something rather than the fat, bald 30 something I was.

On top of the life and death safety issues, it would also be a huge waste of money to go on a big week-long snowboarding vacation (where the whole point is peak physical activity) to be stuck taking half days or resting on my ass most of the time.

Let’s be pragmatic here. Death is terrible, sure, but wasting money and vacation time is unacceptable.

The Solution

Thanks to my friend Heather I bought a FitBit Charge HR and the Aria scale. Tech toys! Things I can hook up to my WiFi!

The plan was to use these as motivators to be a little more active. Take the stairs more often, walk a little bit more, have friendly competition, see how much I ACTUALLY weigh each day.

The plan worked a lot better than expected!

weight-loss_2013-05

 

Three things really combined to help me out.

1. Tracking

Tracking my progress was amazing. At first there were absolutely no weight loss changes, but even on the very first day you can see how many steps you take, how many flights of stairs you climb. A pretty graph is drawn of every stat you can imagine. You can challenge yourself to do more everything!

2. Competition

Not only can you challenge yourself, you can see how many steps your friends are getting. Friend some folks, challenge them, beat them. Be beaten yourself by friends who constantly seem to be more active than you!! It’s a huge motivator to see that other people are along for the ride. It’s satisfying to look at the charts and see you’re at the top, but still encouraging when you don’t make the top yet your friends with busier lives than you’ll ever have are still killing it.

3. PROGRESS!!

That chart above is probably the number one long term motivator. Without the scale and without weight-loss-goal progress I don’t know if I could have kept up the activity day after day, month after month. Sometimes you want to come home from work and take a nap. How do you fight that urge to nap and instead go walk a few miles? Knowing that you’re being tracked, that you set a goal, that you’re competing, and that you are working for progress, these things are HUGE to keeping up with the work.

Running

How did FitBit tracking steps and losing weight turn into running?

This is a problem primarily of metrics. The metrics you track are the metrics you’ll improve. FitBit tracks many things, but mainly steps. 10,000 steps is the default step goal per day.

I started this whole process walking, but the weather wasn’t always great, so it would be treadmill time for me 🙁 I’d listen to podcasts and audio books but getting all the steps in could take a LONG time at 4 miles per hour. Who has an hour to walk on a treadmill?

The Thought Process

Week 2: What’s faster than walking? Jogging a little? Crank that speed up. Wow – the steps just fly by!

Week 4: Holy shit! I wonder how fast I could finish 10,000 steps in?

Week 6: Hmm, I wonder what the best time is that I could run a 5k?

Week 10: Wow, the weather is nice – let’s go outside.

Week 12: That run club that meets at the brewery I like seems cool, and I have friends that go. It’s only 4 miles, I’ll take a rest or two.

Week 13: It’s only 4 miles, I bet I could do it with just one break.

Week 15: It’s only 4 miles. I bet I could just run the whole thing.

Today: Man, work is taking so long today. I can’t wait to get home and run.

So You Think You Can Run?

Now I guess I’m a runner. I have a desire to get out there and run. I look forward to it. I run with friends. I run alone. I do it for me.

Not sure where this is going next. Hopefully the running persists. I doubt I’ll enter any races, but then again I said I’d never run.

I’ve been wrong before. Maybe I was being chased all along and it just took me a while to realize it.

Vienna Avaya Technology Forum

Part of my role on the Nutanix Performance and Solutions team is to “evangelize” the technology and tell the world about all the great work we’re doing writing documents, testing products and solutions, and assisting with customer engagements. The physical manifestation of that is me sitting in an airport typing up this blog post, on my way to the Avaya Technology Forum in Vienna, Austria.

 

Nutanix will have a booth and I’ll be doing demos of the product interface and reaching out to Avaya communications and networking customers. I’ll be joined by members of the local Nutanix team to help share the duties. I’m looking forward to meeting more of the international Nutanix team!

The Nutanix Virtual Computing Platform is a great fit for Avaya customers looking to virtualize their communications infrastructure running Avaya Aura, or IP Office. Nutanix also simplifies the compute and storage side of the data center for those leveraging Avaya Fabric Connect to simplify the network stack.

Imagine being able to scale your compute and storage seamlessly with auto discovery. Imagine one click upgrades of the entire compute and storage ecosystem (INCLUDING THE HYPERVISOR!). More importantly, imagine all the time you’ll have to work on the applications that really matter.

IP Office Reference Architecture

Avaya Aura Reference Architecture

Stop by the Nutanix booth in the Solutions Zone at the Hilton Vienna on May 5th – 8th if you’re in the area!

Nutanix Avaya Aura Reference Architecture

I’m happy to announce that the Reference Architecture for Avaya Aura on Nutanix has been completed!

Aura is a Unified Communications platform with a lot of different components. All of these pieces can now be deployed in VMware vSphere thanks to the Avaya Aura Virtualized Environment and Customer Experience Virtualized Environment initiatives at Avaya. These projects bring together different Aura apps and produce virtualization guides and OVA templates for each product.

The Nutanix Reference Architecture above goes through the most common Virtualized Environment components and breaks down the rules, requirements, and best practices for running on Nutanix.

I’m happy that this document serves as an excellent reference for the administrator in charge of virtualizing Aura. Right now the information in these Avaya docs are spread all over the place. Having a unifying reference source is pretty helpful to any Nutanix administrator sitting there thinking “How do I virtualize this again?” and even helpful to Avaya admins thinking “Where is that doc?”

Aura Components

The core components I address are as follows:

Component Purpose
Call Control Aura Session Manager and Communications Manager
Voice Mail and Messaging Aura Messaging
Presence Aura Presence Services
Configuration Management System Manager
3rd Party Integration Application Enablement Services

There are many additional components not covered directly in the guide, but I’ve included links to these where appropriate.

Planning and Design

Much like other applications on Nutanix, Aura designers and architects need to answer these question about each Aura VM:

  • How many vCPUs does this VM use and reserve (core count / MHz)?
  • How much RAM does this VM use and reserve (GB)?
  • How much storage space does this VM use (GB)?
  • What sort of IOPS are generated / required during peak hours?
  • Are there any other special requirements?

The Nutanix Avaya Aura Reference Architecture doc attempts to address all of these questions.

Here’s an example of the information for Avaya Aura Communication Manager Duplex:

cm-duplex-reqs

Put this individual machine information together with a sample layout. Your layout may vary based on the Aura design. Work closely with the Avaya Aura design team to figure out what components are required and what size those components need to be.

1000-user_layout

Once we know how many VMs and what their specs are, we can figure out the resource utilization of the end system:

1000-user-reqs

With all this information together, the right Nutanix virtualization platform can be chosen. You can use the system with right CPU core count, the right amount of RAM, and the storage capacity and performance to provide exceptional end-user experience.

Your Aura design will certainly differ from the one listed above, but the processes laid out in the guide can help plan for a system of any size with any number of components.

If you have questions feel free to leave a comment, or head over to next.nutanix.com forums and visit the Workloads & Applications > Unified Communications section.

 

Survivable UC – Avaya Aura and Nutanix Data Protection

I wanted to share a bit of cool “value add” today, as my sales and marketing guys would call it. This is just one of the things for Avaya Aura and UC in general that a Nutanix deployment can bring to the table.

Nutanix has the concept of Protection Domains and Metro Availability that have been covered in pretty great detail by some other Nutanix bloggers. Check out detailed articles here by Andre Leibovici, and here by Magnus Andersson for in depth info and configuration on Metro Availability.

Non-redundant Applications

In an Avaya Aura environment, most machines will be protected from failure at the application level. A hot standby VM will be running to take over operation in the event of primary machine failure such as with Session Manager and Communication Manager. In the following example we see that System Manager, AES, and a number of other service don’t have a hot standby. This might be because it’s too expensive resource wise, licensing wise, or the application demands don’t call for it.

1000-user_topology

If multiple Nutanix clusters are in place, we actually have two ways to protect these VMs at the Nutanix level.

Nutanix Protection Domains

First, let’s look at Protection Domains. With a Protection Domain, we configure a NDFS (Nutanix Distributed Filesystem) level snapshot that happens at a configurable interval. This snapshot is intelligently (with deduplication) replicated to another Nutanix cluster. It’s different than a vSphere snapshot because the Virtual Machine has no knowledge that a snapshot took place and no VMDK fragmentation is required. None of the standard warnings and drawbacks of running with snapshots apply here. This is a Nutanix metadata operation that can happen almost instantly.

We pick individual VMs to be part of the Protection Domain and replicate these to one or more sites.

In the event of a failure of a site or cluster, the VM can be restored at another site, because all of the files that make up the Virtual Machine (excluding memory) are preserved on the second Nutanix cluster.

ProtectionDomain

 

Nutanix Metro Availability

But I hear you saying, “Jason that’s great, but a snapshot taken at intervals is too slow. I can’t possibly miss any transactions. My UC servers are the most important thing in my Data Center. I need my replication interval to be ZERO.” This is where Metro Availability comes in.

Metro Availability is a synchronous write operation that happens between two Nutanix clusters. The requirements are:

  1. A new Nutanix container must be created for the Metro Availability protected machines.
  2. RTT latency between clusters must be less than 5 milliseconds (about 400 kilometers)

Since this write is synchronous, all disk write activity on a Metro Availability protected VM must be completed on both the local and the remote cluster before it’s acknowledged. This means all data writes are guaranteed to be protected in real time. The real-world limitation here is that every bit of distance between clusters adds latency to writes. If your application isn’t write-heavy you may be able to hit the max RTT limit without noticing any issues. If your application does nothing but write constantly to disk, 400km may need to be re-evaluated. Most UC machines are generally not disk intensive though. Lucky you!

MetroAvailability

In the previous image we have two Nutanix clusters separated by a metro ethernet link. The standalone applications like System Manager, Utility Services, Web License Manager, and Virtual Application Manager are being protected with Metro Availability.

In the even of Data Center 1 failure, all of the redundant applications will already be running in Data Center 2. The administrator can then either manually (or through a detection script) start the non-redundant VMs using the synchronous copies residing in Data Center 2.

Summary

Avaya Aura Applications are highly resilient and often provide the ability for multiple copies of each app to run simultaneously in different locations, but not all Aura apps work this way. With Nutanix and virtualization, administrators have even more flexibility to protect the non-redundant Aura apps using Protection Domains and Metro Availability.

These features present a consumer-friendly GUI for ease of operation, and also expose APIs so the whole process can be automated into an orchestration suite. These Nutanix features can provide peace of mind and real operational survivability on what would otherwise be very bad days for UC admins. Nutanix allows you to spend more time delivering service and less time scrambling to recover.

 

 

Virtualized Avaya Aura on Nutanix – In Progress

Explaining the Nutanix Distributed Filesystem

The Avaya Technology Forum in Orlando was a great success! Thanks to everyone who attended and showed interest in Nutanix by stopping at the booth. I met a lot of interested potential customers and partners and was also able to learn more about what people are virtualizing these days. There is nothing quite like asking people directly “What virtualization projects do you have coming up?”

Explaining the Nutanix Distributed Filesystem
Explaining the Nutanix Distributed Filesystem

After talking about Nutanix and what I do on the Solutions team, some key themes I heard repeated by attendees were:

“Wow, that’s really cool technology!”

and

“When will you have a document for Avaya Aura?”

The response to the first one is easy. Yeah, I think it’s really cool technology too. Nutanix will allow you to compress a traditional three tier architecture into just a few rack units. It gives you the benefits of locally attached fast flash storage AND the benefits of a shared storage pool. Customers can use this to save money, improve performance, and focus on their applications instead of their infrastructure. After you compress you also have the ability to scale up the number of nodes in the Nutanix cluster with no hard limit in place. Performance grows directly with cluster growth.

The second question is actually why I’m writing this blog today. When will the reference architecture for Avaya Aura on Nutanix be completed?

I’m in the research phase now because Avaya Aura is a monster of an application. It’s actually a set of dozens of different systems that all work together. Each system will have its own requirements for virtualization. Part of getting a reference architecture or best practices guide right is figuring out what each individual component requires to succeed.

Let’s give an example by looking at the Avaya Aura Virtual Environment overview doc. This list is the number of different OVAs that are available:

Avaya Aura® applications for VMware
• Avaya Aura® Communication Manager
• Avaya Aura® Session Manager
• Avaya Aura® System Manager
• Avaya Aura® Presence Services
• Avaya Aura® Application Enablement Services
• Avaya Aura® Agile Communication Environment (ACE)
• Avaya Aura® Messaging
• Communication Manager Messaging
• Avaya Virtual Application Manager
• Avaya Aura® Utility Services
• WebLM
• Secure Access Link
• Session Border Controller for Enterprise
• Avaya Aura Conferencing

Avaya Call Center on VMware (OVA files)
• Avaya Aura® Call Center Elite
• Elite Multichannel Feature Pack
• Avaya Aura® Experience Portal
• Call Management System

Each of the applications listed above is a separate OVA file available from Avaya. Each application has its own sizing, configuration, and redundancy guides. To deploy an Aura solution you can use some, or all of these components.

An Aura document on Nutanix is in the works, but it’s going to be a lot of WORK. I plan on focusing on just the core components at first and a few sample deployments to cover the majority of cases.

I’ve read every single Avaya Virtual Environment document and now just need to compile this information into an easy to digest Nutanix-centric format. In the meantime if you have Avaya Aura questions on Nutanix feel free to reach out to me @bbbburns

The great thing so far is that I don’t see any potential road blocks to deploying Aura on Nutanix. In fact at the ATF we performed a demo Aura deployment on a single Nutanix 3460 block (4 nodes). We demonstrated Nutanix node failure and Aura call survivability of the active calls and video conferences.

Part of the challenge of deploying any virtual application, especially real time applications, is that low-latency is KING. This was repeated over and over by all the Avaya Aura experts at the conference. Aura doesn’t use storage very heavily, but since it’s a real-time app the performance better be there when the app asks for it. All the war stories around virtualizing Aura dealt with oversubscribed hosts, oversubscribed storage, or contention for resources.

Deploying Aura on Nutanix is going to eliminate these concerns! Aura apps will ALWAYS have fast storage access. There will never be any contention because our architecture precludes it. I’m excited to work on projects like this because I know customers are going to save HUGE amounts of money while also gaining performance and reliability.

We really will change your approach to the data center.

Nutanix and The 2015 Avaya Technology Forum

I’m at the 2015 Avaya Technology Forum with Nutanix to talk about Avaya Unified Communications on the Nutanix platform. Stop by the Nutanix and CRI booth to see the Nutanix gear in action. Nutanix 3460 and 1450 nodes will be powering all the demos you see for Avaya Aura and other applications!

I’ve been testing with the helpful engineers at Avaya to do two important things:

  1. Ensure Avaya Unified Communications applications run flawlessly on Nutanix.
  2. Test the Nutanix Distributed File System (NDFS) performance and operation on top of Avaya Fabric Connect.

The result of all this work is being presented here at the Avaya Technology Forum in sunny Orlando. The Avaya colleagues I’ve been working with are from the Boston area (and Canada), so I imagine coming down here to find 81 degrees and sunshine is a welcome change!

The first item I want to bring to your attention is the Nutanix Avaya Unified Communications Solution Brief. This is a high level piece to show the overall benefits of combining Nutanix and Avaya Unified Communications. Nutanix makes the data center admin’s life easier by eliminating silos between UC and other data center apps, bringing scalable compute and storage to the masses, cutting down on management time, providing blinding fast I/O performance, and tying it all together with high availability baked in.

Fig36-Phase2

Whether you’re running Avaya IP Office, a full blown contact center with Avaya Aura, or something in between, the Nutanix platform brings web-scale technologies to these virtual applications. To top it off – Avaya Fabric Connect technologies allow the data center admin to provision highly resilient, low-latency, high-throughput network backbones without the drawbacks of traditional spanning tree architectures.

Nutanix performs hyper-convergence at the storage and compute layer using a software defined Controller Virtual Machine. Find out more here at the Nutanix Bible to see how Nutanix ties together the disks of many nodes to form a resilient, distributed, high-performance compute and storage cluster.

Avaya brings Software Defined Networking and Virtualization with Avaya Fabric Connect.

These two technologies together save time and money in the datacenter, while also providing blazing performance.

Fig21-ProtectionDomains

Check back for updates during the conference. I’ll be sharing a Reference Architecture for Avaya IP Office Server Edition running on Nutanix. In the future you’ll also see a Reference Architecture for Avaya Aura on Nutanix.

Find me at the conference by tweeting @bbbburns or stopping by the Nutanix and CRI booth.

Nutanix and UC – Part 4: VM Placement and System Sizing

In the last blog post I talked about sizing individual VMs. Today we’ll look at placing UC VMs onto a Nutanix node (an ESXi host) and coming up with overall system sizing.

First I’d like to announce the publication of my document for Virtualizing Cisco UC on Nutanix. Readers of the blog will recognize the content and the diagrams 😉 I’ve combined all of this information for publication and delivery to customers and partners planning to deploy Cisco Unified Communications.

Next, let’s look at placing Cisco UC VMs to size a Nutanix system. Once you have a count of all the VMs needed and their individual sizes you can spread them around on paper to see how much hardware rack and stack is in your future. With Nutanix you’ll have a lot less work ahead of you than with any other solution! Use all the methods documented in the previous posts to size the individual VMs.

There are a few options for VM placement. I used Omnigraffle on my Mac to create diagrams like the one you see here, but Visio or MS Excel will work just as well. The “Hypervisor CPU Cores” represent the space available on a single Nutanix node. I didn’t specify ESXi, Hyper-V, or KVM directly because Nutanix can support all three hypervisors.

In a Nutanix block you can have up to 4 nodes in a 2 RU device. Below we see a single 16 core node. New Nutanix models will be released in the future with different core counts, roughly keeping track with Intel’s releases of new hardware. Size your core count based on what’s available on the Nutanix hardware platform page.

*EDIT on 2015-10-23* Nutanix switched to a “Configure To Order” model and now many more processor core options are available, from 2×8 core all the way up to 2×18 core. This provides a lot of flexibility for sizing UC solutions.

Cisco UC VMLayoutTake some space and reserve it for the Nutanix Controller Virtual Machine. Exactly how much space reserved really depends on the IO load expected. The CVM will reserve four vCPUs at a minimum. Looking at the CVM properties in vSphere you can see it actually has eight vCPUs provisioned which is why the shaded area exists. These four vCPUs that exist in a limbo state (provisioned but not reserved) can be used by any application that doesn’t mind CPU oversubscription.

Unfortunately Cisco UC and most other UC applications don’t allow oversubscription so we have to just chop off eight vCPUs right at the start to abide by Cisco’s requirements. Don’t worry though, four of these vCPUs are not lost entirely. Make good use of them by putting a DHCP server there, or DNS, or a Domain Controller. Put a Linux SFTP backup server there if you like for handling incoming application backups from Cisco UC. Mine bitcoins. These cores are yours, you have options!

If you know that a Nutanix node is going to push SERIOUS IO traffic because you’ve read the IOPS requirements and see that you’ll need many many thousands of IOPS for the VMs, bump up the number of vCPUs that you leave for the Nutanix CVM. Under heavy load, the multi-threaded process will use all available vCPUs to handle IO requests. Under normal load the four reserved vCPUs will be plenty.

If you’re unsure of the IO load a machine will generate, fear not! The Nutanix Prism interface shows detailed stats per virtual machine. You can get an idea of what a VM is doing just by watching the Prism page for that VM. Below we see a VM that exhibits a spike in IOPS over a period of time.

Prism stats
Prism VM Stats

 

Along with Excel, Omnigraffle, and Visio, tools exist on the Cisco website to do VM placement. I like to use the UC Placement Tool just because it’s simple. A custom CVM image can be created that uses eight vCPUs (or four) and then the Cisco UC images can be selected from existing templates.

Cisco VM Placement Tool
Cisco VM Placement Tool

This tool is extremely helpful because the sizes of the various Cisco UC components are embedded in the various templates as shown above. The IOPS calculation of this tool isn’t really there yet in the templates. It’s an exercise to the reader (or user) to fill out the expected IOPS of each virtual machine. This info can be cobbled together from the various Cisco wiki pages or from information gathered via the Nutanix Prism page.

Nutanix also makes a sizing tool that can be used to size a Nutanix cluster once the specs of the virtual workload are known.  Check out this nu.school video to get an idea of how the Nutanix Sizer works:

https://www.youtube.com/watch?v=Vyy2n45wE2I

When sizing UC servers, we’ll use the “Server Virtualization” workload type. This means for each VM type (CUCM, CUC, CER) we’ll specify the number of vCPUs, amount of RAM, size of disk, and expected IOPS. Once this information is entered a Nutanix system (along with a number of nodes) will be chosen. This can be checked against sizing calculations above to ensure the right size system is selected.  Here we size 11 CUCM virtual machines. Each VM has 2vCPUs, 6GB RAM, 110GB storage, and an average IOPS of 40 (taken from the Cisco DocWiki).

CUCM Custom Workload
Creating a custom workload for CUCM

Cisco UC is a unique case because the 1000 series Nutanix processors do not currently meet the hardware processor requirements that Cisco specifies. This means the 1000 series nodes aren’t appropriate for Cisco UC, but all other node types are. We’re going to maximize for CPU cores because of Cisco’s 1:1 core:vCPU mapping. With most Cisco UC virtual machines we won’t run into any storage size or storage performance limitations on the Nutanix system. The primary driver of sizing will be number of free cores!

Maximizing for available storage space or other factors due to other workloads (like MS SQL or Exchange for instance) may lead to selection of a different node type. Nodes in a cluster can be many different types and can be mixed together in the same cluster. A cluster will often contain several storage heavy nodes for VMs with large storage requirements.

Summary and Next Steps

We’ve covered an overview of Cisco UC and Nutanix, and how to size individual UC VMs and place them on a Nutanix system. With this information it’s possible to design a complete Cisco UC solution powered by the Nutanix platform.

Assets from both Cisco and Nutanix can be leveraged to build a completely supported UC solution that takes up less rack space, power, and cooling. It’ll be simpler to setup because there are fewer components. It’ll be simpler to manage for the same reason AND because of the slick web front end to the combined compute and storage components. It’ll be more secure because federal STIG requirements are built into the product in easy to manage config settings (running a security script). One click upgrades for the entire compute and storage infrastructure means admins will be spending more time on the slopes or drinking beer and less on weekend change windows. That’s something I can get behind!

To learn more about Nutanix I recommend reading through the Nutanix Bible by Steve Poitras. It’s a wealth of great information on how the technology under the hood works. The nu.shool YouTube channel also has some excellent white board videos that I highly recommend.

Feel free to reach out to me on Twitter @bbbburns for follow up, or comment here on the blog.