User Experience Troubleshooting Deep Dive, Part 1: How to start?

Many customers that I have visited are struggling with their User Experience. And quite often they have the tools to actually monitor it, but they have challenges in interpreting the right information. There are many solutions on the market that can be used to monitor the User Experience, but configuring them to show the right information can be hard. And where should you start? What information is useful? On regular basis I get questions on which KPI’s and metrics should be monitored. But how should you interpret these metrics? This blog series is dedicated to help you on monitoring some useful KPI’s and possibly improving your User Experience.

As vRealize Operations for View is the most common tool that I see at customers, I will use this as the monitoring solution in one of the next posts in the series. This first post is dedicated to give you some overall information on metrics.

So how should you get started?

Well, let’s first explain what User Experience (UX) actually is. Traditionally seen, users were working on their local desktop and running their applications that were installed on the endpoint. If they had a powerful pc, that would most of the times mean that their applications were running smooth. Their UX in that case was positive. If case they had a negative UX, 9 out of 10 times we expanded or replaced endpoint hardware and the problem was solved.

When taking applications and desktops to the data center, you can create the best UX ever, but a lot more dependencies exist that can create a negative impact on the UX. And all of these dependencies (or in our case KPI’s/Metrics) need to be monitored so you are aware on the dependencies that you can actually control (like data center compute hardware and storage) as well as external decencies that you don’t (the end user’s internet connection).

The most challenging thing in monitoring the UX isn’t selecting a solution or creating a dashboard for it. The challenge that a lot of customers are facing, is to know what to look for in case of UX issues. So let’s talk about the dependencies (and call them KPI’s and metrics) first.

In case of a negative UX, end users first complain about their “desktop being slow and they need more CPU’s and RAM!”. The obvious one would be to give the user a new virtual desktop with better specs, but most of the times that isn’t the solution.

Smashed computer
You want to avoid situations like these. So proactive monitoring on User Experience is essential!

KPI’s and Metrics

To get a better understanding on how to solve UX issues, let’s focus on KPI’s and Metrics.

UX depends on a great variety of parameters. The following metrics are the ones you could start with as they are the ones I use most.

CPU – Usage in %
See the total cpu usage in the VM, based on a percentage. Is useful when wanting to know what the overal usage is in a desktop.

CPU – Usage in MHz
See the CPU usage, based on the actual clock speed. Could be useful if certain applications aren’t able to use multiple threads. In this case you could see that a certain thread is using a complete core.

CPU – Ready times
If you have CPU contention, the ready times could be very high. CPU  contention means that the virtual CPU’s need to wait in line before a physical CPU can handle the calculations.

RAM – Usage
It says what it does. How much RAM memory is a virtual desktop using.

RAM – Ballooning
Something you need to avoid at all times. It means that an ESXi server is running out of memory and will try to steal available RAM from a virtual desktop.

RAM – Swap out
Again, something you want to avoid. If a virtual desktop is swapping out, it means that it is running out of RAM.

Protocol – PCoIP/Blast – Roundtrip latency
The total amount of latency between the endpoint of the end user (such as a tablet or a laptop) and the virtual desktop, in a round trip. So it is measured from the endpoint to the virtual desktop and back.

Protocol – PCoIP/Blast – Frame rate
The amount of frames that is transferred from a virtual desktop to the endpoint. The more frames, the more data is transferred to the endpoint.

Disk – Latency
The latency between the virtual desktop and the datastore. The lower the latency is, the quicker a IO request can be handled. Needs to be measured for both reads and writes.

Disk – Read IOPS
The number of IO read requests that can be handled by the storage device. Rule of thumb: the higher this number, the better the performance.

Disk – Write IOPS
The number of IO write requests that can be handled by the storage device. Rule of thumb: the higher this number, the better the performance.

Disk – Free capacity
Also says what it does. Should be measured per disk.

OS – Logon time
Total amount of time a used needs to get through the complete logon process (including profile load, logon script load, shell load, etc). In case this number is to high, more in-depth metrics can be loaded.

OS – Uptime
Also says what it does. The total amount of time a desktop OS is running since the last reboot.

Network – throughput
The total amount of data that is transferred by the virtual desktop’s network card.

Network – transmitted data
The data that is transferred through the network card from the virtual desktop.

Network – received data
The data that is received by the network card from the virtual desktop.

This is just a brief overview. Somewhere in the next few posts, I will dive deeper in the different metrics including some best practices around thresholds.

So now you have an idea of the types of metrics that can be used. In the next post I will explain more details on how these metrics could be monitored with vRealize Operations including some information on dashboards.

Continue to part 2: Monitoring with vROps

Johan van Amersfoort