Cloud Pod Architecture with F5 LTM or NSX

In 2014, VMware announced a new feature in Horizon 6 called Cloud Pod Architecture. This feature can make sure that you can run Horizon based VDI/Remote Apps from multiple data centers and that the maximum number of desktops can exceed 10,000 per Horizon implementation (by setting up multiple POD’s with 10,000 desktops each). To use this feature, it is a best practice to set up load balancing as well. And in case of geographically separated data centers, a load balancer with support for a Global Traffic Manager as well. F5 has such a module next to their Local Traffic Manager (LTM) as well. But what about NSX, the network virtualization software that has a load balancer included in the solution? Can you use that in conjunction with Cloud Pod Architecture (CPA) as well? In this post I will explain the functionality of CPA and why/where you would like to add a load balancer with specific functions.

What is Cloud Pod Architecture?

Well, first watch these three short instruction videos so you know how what it is and how it should work.

To get CPA to work in a normal/common practice way, you need some prerequisites to be in place:

  • A minimum of 2 POD’s
  • Horizon 6 +
  • Redundant hardware in each POD (hosts, networking, storage, etc)
  • Local load balancing
  • Global load balancing
  • Session persistancy

And all of this is based on the fact you use Horizon 6+ in combination with security servers for external connections which are paired to connection servers.

Take a look at the following figure.

Cloud Pod Architecture F5
Figure 1: CPA with GTM

The toplevel Virtual IP (VIP) is used (with an FQDN) for every external device to connect to an available desktop. F5’s Global Traffic Management (GTM) is used to automatically balance the load between the two geographically separated POD’s. F5’s Access Policy Manager (APM) is used to create session persistency when users disconnect and reconnect their session (and so aren’t sent to the wrong POD and getting a new desktop while in the other POD there is a disconnected session. Both POD’s have F5’s Local Traffic Manager modules to locally load balance the security servers for external connections. Connection Servers can also be behind an F5 LTM to balance connections for internal users.

Take a look at the this blog post for more information around F5 and Cloud Pod Architecture:

Since Horizon 6, a lot happened. CPA got more mature, the Unified Access Gateway appliance was born and Horizon 7 with a new protocol called Blast Extreme entered the game. Oh, and don’t forget NSX which we  see more and more at customers to implement micro segmentation on a Virtual Desktop level. Really cool stuff.

So what has changed in the architecture?

Take a look at the following figure.

CPA Before and After
Figure 2: Load balancing of a single POD

In this picture we see a single POD. On the left side it is configured with LTM load balancing and traditional Security Servers. Each Security Server is directly paired with a Connection Server. On the right side LTM load balancing in combination with Horizon UAG’s (AP’s in the figure) are used. UAG’s aren’t linked to a single Connection Server, but are linked to a FQDN. And that’s where the fun part starts. That FQDN could either be of a Connection Server or a couple of Connection Servers that are behind a LTM load balancer as well.

Because the UAG’s aren’t paired to a Connection Server, the Connection Servers can be used for both internal and external connections (while dedicated Connection Servers we’re needed for internal connections before the AP was born).

The next great thing about Horizon 7 is the Blast Extreme protocol, which gives the user a great User Experience (almost comparable to PCoIP), but over the HTTPS protocol instead of a dedicated TCP/IP Port. This is an important part when looking at other possibilities for load balancing.

So how to use these new features with CPA?

No real new documentation has been released around CPA and these new features. But a lot of new possibilities arose when these new features came out. And that includes new possibilities for CPA. F5’s BIG IP’s are a common practice wth CPA, but is that still necessary? The answer (of course) is: it depends. When all the right factors are in place, it will be possible to use only LTM or maybe even NSX with it’s load balancer.

To use only LTM, the following prerequisites must be met:

  • A minimum of 2 POD’s
  • Horizon 7 (although 6.2 might also work in certain situations)
  • Redundant hardware in each POD (hosts, networking, storage, etc)
  • A stretched VLAN (over both data centers)  in which load balancers can be connected (based on VXLAN or Cisco’s OTV)
  • F5 LTM with an external VIP and an internal VIP (or other LB that supports required features. Take a look at this post by Mark Benson for more info)
  • A stretched management cluster
  • Horizon 7 UAG’s
  • Blast Extreme (so no PCoIP)

Take a look at the following figure.

Cloud Pod Architecture LTM
Figure 3: CPA with only LTM

As you can see in figure 3, the toplevel load balancers are placed in a stretched layer 2 DMZ network. You could either choose for active/passive or active/active load balancing. In my case I have chosen for active/passive to consolidate al incoming connections and avoid more cross-site connections.

For an assumed number of 2000 maximum connected sessions, we deployed 2 active UAG’s (AP’s in the figure) and one passive UAG. The load balancer is set up with a pool containing the 3 UAG’s, where 1 and 2 have a higher priority. In case DC 1 fails, UAG3 will accept disconnected sessions and redirect them to their desktop. UAG1 and UAG2 will be restarted by HA in DC2 and within a couple of minutes all UAG’s will run from DC2. Make sure to add DRS anti-affinity groups in combination with DRS host groups. Also make sure to have storage DRS enabled with datastore groups and again, anti-affinity rules.

As mentioned earlier, Security Servers are paired with Connection Servers on a 1 to 1 basis. UAG’s don’t do that. In case of a UAG, it is linked to a namespace. And as you can see in figure 3, all UAG’s are pointing to a VIP that is set on a secondary layer of load balancers that contain a pool with Connection Servers. This will mean that the UAG’s will tunnel an incoming connection to the VIP, which in it’s case balance the connection over a single connection server in the pool. The connection server will first check in it’s own POD or eventually in the other POD(s) if there is an available (or existing) desktop. If there is a desktop available, the incoming connection will be tunneled to the desktop and the user is happy.

In this case, no APM module is needed because all sessions are virtually tunneled from one data center. And those sessions are known as long as they exist in either POD’s.

Please keep in mind that in the above situation only HTTPS (Blast Extreme) traffic is load balanced and tunneled.

The following figure outlines the steps that are described in the above section.

CPA Flow
Figure 4: Cloud Pod Architecture Flow
  1. A user requests a session and is pointed to the external VIP of the UAG’s (AP’s in the figure).
  2. The active LTM LB sends the session to either of the UAG’s in the pool.
  3. The AP’s forward the session to the VIP of the internal URL.
  4. The LTM LB sends the session to either of the Horizon Connection Servers.
  5. The Connection Server returns a desktop to the user from one of both POD’s.

How would this situation work in a DC failure?

In case of a DC failure there are certain steps that are automatically taken so traffic is routed and availability to the service is retained.

First of all, if the secondary DC is down only active connections to the desktop will fail. If a user reconnects to Horizon, they will receive a desktop in the active DC.

But what happens if the primary DC fails? The following figures outline the steps that are automatically taken.

Step 1: The primary DC fails.
Cloud Pod Architecture DC Failure
Figure 5: The primary DC fails

If the primary DC fails, the passive load balancer becomes active and automatically picks up new sessions. The existing sessions from the load balancer in DC 1 are killed. The UAG (AP in the figure) in DC 2 directs the connections to the new active load balancer in front of the connections servers and sends it to an available one.

Step 2: vSphere HA boots up AP’s.
Cloud Pod Architecture DC Faillure
Figure 6: vSphere HA boots AP’s in DC 2

As the management cluster is stretched, UAG’s that were running in DC 1 are now booted in DC 2.  They will automatically pick up new connection as soon as the load balancer notices that they are online.

Step 3: Primary DC restores.
Cloud Pod Architecture DC Failure
Figure 7: Eventually DC 1 restores again

When the primary DC becomes available again, the POD in that DC boots up and will accept incoming connections from the active load balancer in the secondary DC.

How would NSX fit in this situation?

take a look at the following white paper: NSX-EUC Design Guide – v1.0.1.pdf

As you can see, NSX contains a load balancing function as well. And because only HTTPS traffic is tunneled and load balanced, there are more load balancers that could fulfill this requirement.

If you like to know more on how to configure NSX load balancing for the Unified Access Gateways, please check out this post from Pascal van de Bor.

With blogpost I hope I gave you an idea on how to achieve Cloud Pod Architecture based on a single pair of Active/Passive or Active/Active load balancers.

If you have any questions, please let me know.

Johan van Amersfoort