In part 1 and part 2 of the User Experience Troubleshoot Deep Dive Series I focussed on explaining what metrics to look at and how to monitor them. In the next coming parts I would like to dive deeper in the metrics. Part 3 is dedicated to protocol metrics, so PCoIP and Blast Extreme/BEAT.
Why not RDP you would say? RDP doesn’t have that much that can be monitored and tuned, so I won’t focus on it in this post.
First of all, let’s start with the different protocols.
PCoIP (PC Over IP)
PCoIP was introduced in 2008 in version 4.0 of VMware View. It is a licensed protocol and owned by Teradici. The protocol can be encoded/decoded on both hardware and software levels. Mainly when using heavy graphical applications, hardware encoding/decoding can be used by leveraging Teradici Apex cards on ESXi host level and offloading chips on a client level. Many manufacturers like Dell, HP and Fujitsu build Thin Clients and Zero Clients that contain those chips. For a long time, PCoIP was the way to go when you were in need of graphically enhanced desktops or applications in a VDI solution. PCoIP has the ability to send lossless images to an endpoint which means that almost no compression is used. See it like an MP3 vs WAV. In case of a WAV file, no compression is used and what you hear is an (almost) exact dump of CD quality audio. The downside of this is that the protocol will use more bandwidth than possibly expected and in certain cases all bandwidth that it can use. So proper tuning of the protocol is an absolute recommendation!
Blast Extreme Adaptive Transport (BEAT)
The protocol formerly know as Blast and later Blast Extreme. In the fall of 2013, VMware shipped the first version of Blast to customers. In the Horizon View 5.2 feature pack 1, it was possible to add the protocol to the connection server and connect from an endpoint by using a web browser. Blast uses standardized encoding schemes for video (JPG/PNG and H.264) and audio (Opus). Unlike proprietary encoding schemes, these standard formats are supported in a wide variety of browsers and devices. In 2014, Blast became one of the standard protocols in Horizon 6 and in 2016 it even became a preferred protocol when it shipped with Horizon 7. Today, it seems to be the primary protocol that VMware is actively developing on and has a similar or better user experience while being more efficient on endpoints and handling network traffic (and latency) somewhat better. Salim Abiezzi wrote a great article on Blast Extreme and here you can read more on what improvements were made in Blast Extreme to become BEAT.
As BEAT uses standard encoding schemes and natively supports web browsers, this gives BEAT an advantage over PCoIP as no separate client is needed to run desktops/applications. But, of course, this also has downsides. Browsers don’t have that much options in integrating with an operating system on an endpoint. So when looking for an ultimate user experience, HTML5 might not be your primary choice. But, like mentioned, BEAT is much more network efficient than PCoIP. Especially when having bandwidth constraints or latency issues, BEAT would be the way to go as it uses UDP instead of TCP when latency increases.
In either way, the protocols need to be tuned and monitored because no situation is the same. The following section outlines the different protocol options and corresponding metrics that can be monitored.
Encoded Frame rate
One of the first things that you want to know and possibly tune, is the number of frames that will be transferred from the virtual desktop to your endpoint, also called frame rate. What the ideal frame rate is, is completely depending on the type of application that runs inside the virtual desktop. If a user runs Microsoft Word or Excel, the number of changing frames (FPS), is fairly small (2 – 8) depending on how fast a user is typing. But when a youtube video is watched, this can easily be somewhere around 25 – 30 frames or even more. One of the easiest ways to tune the connection protocol, is to scale down the frame rate. Because the average user won’t notice a scaled down frame rate when working. Of course, there are exceptions, but you want to design these settings per use case.
In vRealize Operations, the encoded frame rates can be viewed by opening session metrics. Every session has an Encoded Frame Rate metric, unregarded if you are using PCoIP or BEAT.
In an ideal situation, you want the value of this metric to be as low as possible. Because every frame that is received by the end point has a certain size. And the bigger the frame, and higher the frame rate, the more bandwidth you might need to satisfy the end user’s UX requirement. And if your latency is to high, a higher frame rate might have a negative impact on the UX.
As a good starting point, the frame rate for a normal office users could be set to 15 FPS. Users that have a requirement for a higher frame rate because of video playback, could start out with 25 FPS.
In both cases, if the FPS is set higher, a user will hardly notice the difference. A lower FPS could also be possible, but tune and test this together with the end user.
There are exceptions though. If your user needs a graphical intense application, the FPS might be higher. Again, tuning and testing is always required.
The size of every frame multiplied by the FPS will get you an idea what the throughput could be. But when adding more data through the protocol by using features like usb redirection or clipboard redirection, the throughput can even be higher. The throughput per session completely depends on the use case. And there isn’t a good or a bad. If your users need a certain amount of bandwidth to have a good UX, so be it. But there are some things to take into account when tuning these settings and designing your infra.
- Like the FPS, you want to keep the throughput as low as possible so a little bit of latency is still acceptable.
- The internet connection in your datacenter might have enough bandwidth, the location where your users work, might not.
- Avoid using redirected USB devices. Redirect though the appropriate channels instead. Example: you are able to attach a webcam or scanner to redirect the device to the virtual desktop. In that case the images are send to the virtual desktop uncompressed. HD Webcams could easily have a throughput of 50 Mbit. When using twain or AV redirection, this could be compressed to 500 Kbit. Quite a difference right?
- Like explained earlier, only turn on the “Build to lossless” option on if you really need it. Enabling this option will send the screen data uncompressed to the endpoint. In some cases really helpful or a requirement (like with X-Ray photo’s), but it most cases not necessary in case of office or even media use.
- Most users will nowadays work on a corporate wifi network. Keeping the throughput low will also benefit users over a wifi network as most wifi access points have a maximum throughput that will be divided by all connected guests. The more guests, the more saturated the network might become. Having a high throughput might impact your UX in a negative way when you share a corporate wifi network with a lot of other guests.
- On an average 250 Kbit might be enough for most office users if they aren’t using redirected devices. Video playback will require more bandwidth, but this is depending on the quality of the video. 60 FPS 4K video requires more bandwidth than 25 FPS 720P video.
In vRealize Operations, the throughput can be viewed by opening session metrics. Every session has transmit and received bandwidth and throughput metrics, unregarded if you are using PCoIP or BEAT.
This isn’t a metric that is tuneable. But certain values give you a good idea what is happening with the end user. Again, the lower the latency, the better the UX. But this is where protocol differences could be noticeable. BEAT (by default) is designed to handle high latencies (up to 300 ms) and still offers a good UX. Of course, this UX is depending on what the user is doing. If the user is working in a spread sheet or a document, a higher latency might not be noticeable. But when streaming high quality video with a high FPS, the video might look laggy. In general, you would like to keep the round trip latency below 50 ms. This will give you a good UX in most of the use cases. And again, there are some exceptions. Graphical intense applications might become laggy as soon as 50 ms latency is reached, but for normal office users and normal videos this will be sufficient.
In vRealize Operations, the roundtrip latency can be viewed by opening session metrics. Every session has a roundtrip latency metric, unregarded if you are using PCoIP or BEAT.
Something you really want to avoid is network packets which are being dropped. UDP traffic has the advantage over TCP that it is sort-of one way traffic. In case of TCP traffic, every packet is sent and will require an acknowledgement from the receiving end to the sending end. In that way you are certain that traffic is consistent. But in case of video transfers it might not be an issue if certain packets are dropped. A dropped packet is a packet that is sent, but never gets delivered at the other end. In general, you need to keep the dropped packets below 1%. The challenging thing is that from end to end, a lot of network-related components can exist that could impact your traffic. So be sure to monitor your network hardware for performance and dropped packets. Monitoring both your desktops as well as hosts for dropped packets, will make sure you know if these components cause issues.
In vRealize Operations, the packet loss can be viewed by opening session metrics. Every session has a packet loss metric, unregarded if you are using PCoIP or BEAT. Also, the vNIC of a virtual desktop has a packet loss metric and the physical NIC of a host as well.
I hope this gave you some insights in what to look for when users are complaining about a bad user experience. In the next post I will dive into CPU and RAM metrics.