Wednesday, March 22, 2017

Did You Know #6 - Using custom metrics groups in vROps for troubleshooting


Welcome back to the Did You Know Series on vRealize operations Manager. As I mentioned in the first part of this series, the goal here is to unearth the Best Kept Secrets of vRealize Operations Manager. 

This could be across features, functionalities, use cases, integrations, APIs or any tips or tricks which can help make day to day operations of SDDC easier and fun with vROps!

Today I will talk about how you can make troubleshooting an easier process with vROps. Troubleshooting as we all know is not a skill, it is a methodology. It would NOT be incorrect to say that each one of us has a different troubleshooting style. With vRealize Operations you can troubleshoot issues the way you like them. Some prefer OOTB dashboards, some like to create there own personalized views and some prefer to jump into what me and Iwan call God Mode aka the All Metric view in the product.

The All Metrics view of the product can easily become complex as it shows you all the metrics which are associated to an Object Type. If you look at vRealize Operations Manager 6.4, the product gave you some OOTB custom metric groups which can be used to list all common metrics around CPU, Memory and Disk. These were the OOTB options and might not fit all the needs. If you are on vROps 6.4, click on a VM object and click on All Metrics and you will see this:


You can see that apart from all metrics and all properties for the virtual machine, you can see 5 custom categories which list specific metrics which can be used for troubleshooting. As soon as I double click on CPU, I will see all the key metrics pertaining to the CPU Metric group in one shot on the right pane:



While this is a cool feature and make troubleshooting really simple, there was one use case which could not be solved here. If as an admin I wanted to create my own metric group where I want to focus on key metrics of my own choice, I was unable to create a metric group for same. While this was not possible with vROps 6.4, with the arrival of vROps 6.5, this feature is now available and now you can create your own custom metric groups with the metrics you like to use for troubleshooting.

I will show you how to create one custom group at virtual machine level:

1- Click on any virtual machine in your environment.

2- Click on All Metrics.

3- Click on the blue wheel and click on the Add Group option.















4- Provide a name. I will call it "VM KPIs"









5- Now you can drag and drop any metric from the all metrics group to this new created metric group:



Here are a few KPIs which I added in my vROps as they help me troubleshoot in God Mode with a single click.....

VM KPIs

CPU | Demand %
CPU | Usage %
CPU | CPU Contention %
CPU | CO-Stop %
CPU | Ready %

Memory | Usage %
Memory | Contention %
Memory | Balloon %
Memory | Swap In (KB)
Memory | Compressed (KB)

Virtual Disk | Aggregate of all instances | Commands Per Second
Virtual Disk | Aggregate of all instances | Total Latency
Disk Space | Snapshot | Virtual Machine Used (GB)
Guest File System Stats | Total Guest File System Free (GB)

Network I/O|Aggregate of all instances| Packets Dropped %
















Host System KPIs

CPU|CPU Contention (%)
CPU|Demand (%)
Memory|Contention (%)
Memory|Total Capacity (KB)
Memory|Consumed (KB)
Memory| Usage (%)
Network I/O|Aggregate of all instances|Packets Dropped (%)












Cluster KPIs

CPU| CPU Contention (%)
CPU|Demand (%)
CPU|Max VM CPU Contention (%)
Memory|Balloon (KB)
Memory|Contention (%)
Memory|Max VM memory Contention (%)
Memory|Usage (%)



 Go configure your vROps with the metrics you like and make troubleshooting an easy and fun process...

And yeah. Keep sharing!!




Sunday, March 19, 2017

vROps Webinar 2017 - Announcing Part 1 : What's New with vROps 6.5


Welcome to the vRealize Operations Manager Webinar Series 2017. With the huge success of the series back in 2016, we wanted to take a break, enjoy the success and come back with full rigor for this series in 2017. We are here and we are charged up to give you some more dope on vRealize Operations manager in the year 2017. 

The delivery mechanism would be same as last year. We will start with talking about a topic and then jump into a live environment to see what happens when the rubber hits the road...

To begin the series, we will start with the latest edition of vROps to see the enhancements VMware has done in the product and how customers can operationalize these features to make their operations simple and effective.


Here are the details: 👇👇👇

Session Title
Date
Tuesday, 28th March 2017
Time
1:00 PM to 2:00 PM Pacific Time
Speakers
Sunny Dua , Simon Eady
Webinar Link
Save Invite


See you at the Webinar!!  👋👋👋👋


Looking back at vRealize Operations Webinar Series 2016

The series started back in 2016 when Simon & I decided that we need to share the work which we are doing with our customers in the field. This would help individuals like us and customers to gain insight into how they can improve their operations by using the features which vROps has to offer. The key purpose was to show how the product would solve specific customer use cases along with deep dive into architectures, commonly used features, tips and tricks and more.

We recorded a total of 12 Episodes with a total 15 hours worth of material. With more than 10000 views on YouTube of our episodes, the work was highly appreciated by audiences and they have asked for more this year....

Before we start with the 2017 series... I quickly wanted to share all the material with you so that you can continue to learn....






Special thanks to Simon and other guest speakers of making this happen!!


Hope you enjoyed the series..... Looking forward to produce more content in 2017.



Friday, March 17, 2017

Performance over Power : Make the right choice.

Power management is not a new topic when it comes to a hypervisor. We all know that one of the by product of virtualization is "POWER SAVINGS". Even before you start realizing the other benefits of virtualization, power bills is the first Opex savings which makes that return on investment on virtualization speak for itself. 

The reason behind writing this article is to make customers aware that since you have already saved a lot by virtualization, you might not want to cut the corners by trying to save more by scaling down the CPU frequency of an ESXi server to save power. For that matter it applies to all the hypervisors in the industry. We all know that once we start consolidating 10's of physical servers on a single hypervisor, we already end up saving a lot on power and hence you should not worry about throttling down the CPU for saving power on a hypervisor. 

While one can argue that if I can save  more power by using the BIOS features and the hypervisor features to throttle down the CPU frequency, then why not? The answer lies in the trade-off. The trade-off in this case is CPU Performance. While we all know that this throttle is dynamic and will be automatically change on demand, the difference between when the demand is made vs the resource availability leads to Contention. While basic applications might not be impacted by this contention, their would always be applications and the underlying VMs which would not be happy with the latency being introduced due to this throttle. In a lay mans term, this would result in performance issues which are absolutely uncalled for.

I know I am not talking about something unique and every vSphere Admin / Architect is aware of why "High Performance" for power management is critical. I can assure you that there are a number of myths around how power management settings for a hypervisor such as ESXi should be done. Another reason behind highlighting this issue is that vRealize Operations Manager does a great job in tracking the latency which I described earlier. This latency is termed as CPU Contention %. This is the percent of time the virtual machine is unable to run because it is contending for access to the physical CPUs.

If you dissect the statement which I made above, their could be number of reasons behind the inability of the VM to get what it wants, one of them is the efficiency lost due to processor frequency scaling a.k.a power management savings.

The scope of this post is around power management, hence I will not delve into other conditions for now. Before I go further and give you the exact power management settings, I would like to give you a real world example where the CPU contention faced by an application was extremely high due to incorrect power management settings and once they were changed to a mode where we disabled the throttle and made sure CPU was available all the time and never snoozed, the contention dropped down drastically and the application was humming along without any performance bottlenecks. Thanks to vROps that we could identify the issue and solve it within a matter of minutes 💪😁

In the metric chart below, I have a virtual machine which is facing CPU contention % in the range of 10% to 27% in the month of November and December. This was when the application was reported to be sluggish and showing bad performance. In-fact if you observe, the application which was facing an issue is a in memory database with an analytics engine ( it is actually the vROps node itself).

The application actually went into a state where it stopped collecting data as well and hence you see a gap from December 27th to January 8th. This is when the things went out of hand and we decided to take an action to reduce this contention. 



As I explained before, CPU contention could be due to a number of factors. Some of them include, high over commitment, over population of VMs on a host, large virtual machines (crossing NUMA boundaries) and CPU throttle due to power management. Since we knew that this vROps node is the only VM running on that ESXi host, we immediately jumped to check the power management settings on vSphere (hypervisor) and the BIOS (hardware).

Yes, you need to check both and ensure they are set correctly for you to have continuous CPU availability. The correct settings would be:


➦ On vSphere set Power Management to "High Performance"

➦ In BIOS set Power Management to "OS Controlled" (requires restart of ESXi)

You can see from the metric chart below that we are plotting the power management setting of the ESXi hosts (where this VM was running) on both Hardware (Power Management Technology) and vSphere level (Power Management Policy) before and after the change. 



Once the above change was made the CPU Contention % experienced by that virtual machine dropped drastically and we had a well performing application and happy users. You can see the metric chart below which shows the affect on the latency experienced by the VM post the change.



This is a simple yet very powerful example on how Power Management settings play a big role in providing you best performance in your virtual environment. I would recommend that you act immediately to ensure that your environment is not suffering with this issue and the virtual machines are getting what they are suppose to get from a CPU standpoint. Remember a poor CPU performance has a cascading effect on Memory, and I/O buses and hence it is important that this is fixed as soon as possible.

If you have vROps, then it would be very simple for you to visualize the current settings across your environment and track this as a compliance metric going forward to ensure that any new ESXi host added to your environment provides the best in class CPU resources to serve your virtual machines.

If you are vROps 6.4 and beyond, you can simply look at the ESXi host properties by listing them in a view to see what the power management settings are. If they are not correct, you now know that you have a task in hand 😁

Hope this helps..... 👊👊👊








Thursday, March 9, 2017

What is Fast... A recent interview with David Davis!

Working from the VMware headquarters definitely has more benefits than getting great Indian Food and rubbing shoulders with the Whose Who of the Virtualization and Cloud Industry :-)

On a busy Monday afternoon, I was told that +David Davis with his team from +ActualTech Media - is at the HQ and they are speaking with the Cloud Management Business folks about VMware's vRealize product line. David as we all know is a great inspiration and a mentor for many who want to learn about VMware and the technologies in the surrounding eco-system. Just like others, I have followed him very closely and have always appreciated the work he has done for the vCommunity. We spoke at length about the work I am doing related to vRealize Operations Manager, Blogging, Webinar series etc. Later we decided to do a recorded interview which might help the viewers of Actual Tech Media and vXpress to have a quick look at where we are going!!

While I have interviewed with him before, along with +Iwan Rahabok during VMworld 2016, that interview never got published due to technical difficulties. Iwan was missed during this interview, but I had him covered I guess :-)

Here is the recording of the interview. It's just 15 minutes and I am sure it would be worth!!







Wednesday, March 8, 2017

Using End Point Operations agent with vRealize Operations Manager

End Point Operations as a feature of vRealize Operations Manager is an area where I get a lot of questions. During a workshop with a customer today, I was asked about a number of things which rang a bell in my head. I spent some time with the customer to de-mystify a lot of apprehensions around EPOps and the conversation ended into a meaningful action plan.


The phenomenon of agreement in the IT world (especially operations) is rare and hence I think that it would be worth to share the findings with others who might be looking for similar answers. With the slow departure of Hyperic and End Point Operations becoming mainstream, it is important that customers who have use cases for monitoring beyond the hypervisor layer (which vROps does for you OOTB) have a clear visibility into where End Point Operations stands.

It is important to understand that Operations Management as a process is not limited to a tool or feature. If you walk-in into any medium or small sized customer environments, you will find a number of tools solving point problems. I am not saying that vROps can solve all the problems of the world, but I do see customers inclining towards tools consolidation and considering vROps as the mainstream platform which can help them reduce all the complexity while covering the most number of use cases they would have around Operations Management.



While I can go on and on about Operations, the scope of this post revolves around End Point Operations. The use cases are pretty specific, however they can go DEEP and WIDE as we are moving from the hypervisor layer and into the zone of Operating System, Middleware and Applications. If you have been into IT operations, you would quickly realize the complexity you need to deal with once you look at all of this data in silos using multiple tools. Hence, if a tool can provide capabilities for joining all the dots to give you a complete picture right from the physical hardware, up-to the response times of a transaction one is running on a OLTP platform ALL IN ONE PLACE, it is a easy sell.

The statement I just made is also known as the "HUNKY DORY WORLD". Only if it was this simple, many of us would not have a job :-)

End-Point operations in its current form, and time to come is eventually trying to solve this problem. Post the integration of EPOps solution within the vROps platform, the unified view is no longer a distant dream. Having said that, I have to be brutally honest in saying that this UNIFIED VIEW needs a great amount of expertise to ensure that it is build right. With the evolution of EPOps, I am glad to share that there are a number of solutions which VMware has been able to build and deliver and with this pace of delivery, I am sure there are a number of them on the roadmap (only if I was allowed to share the roadmap without an NDA ;-) ).

I must call out that the list I am about to share are the one's which are made by VMware. Blue Medora, a very close partner of VMware, has delivered a number of management packs and plugins with un-matched quality and they are worth exploring as well.



Here is the list of available vROps plugins for end point operations manager. These will help you monitor the applications and related services.

I would highly recommend that these should be tried in a Test/Dev environments to see if they meet your use cases and then deployed in Production. You can use the links to download the plugins and the related documentation for more details:-



























Some best practices around installing management packs & EP Ops:-

  • Ensure that you test them in a Dev environment. This includes, the metrics, the relationships, the new alerts and dashboards which are added by the pack OOTB.
  • Once you have the POC done and content finalized, install the pack in Production.
  • Disable all the OOTB Alerts and enable what you need. Customise them to meet specific use cases. It is always a good practice to disable all the alerts and clone the OOTB alerts (whichever you need). This will not impact the alerts you chose to keep during major product upgrades.
  • Export any unwanted dashboards introduced by the management pack (for a backup) and delete them. This will help avoid clutter.
  • You might want to be selective with your approach with deploying EP Ops as per the sizing done for your vROps deployment. If you plan to go beyond the capability of your vROps cluster, please expand it by using the Sizing calculator.

With this I will close this article. Hopefully the information above would tickle your brain to see how you can deliver the unified view which you always wished for.

Share your thoughts in the comments section and as always, Share the Post.. You never know who needs it!!




Friday, March 3, 2017

vRealize Operations Manager 6.5 is out. Operationalize with Confidence!

VMware just announced the release of vRealize Operations Manager 6.5. With its cadence of delivering a new version every few months with valuable features, VMware allows customers to quickly use the tool as a means to ease out operations of the Software Defined Datacenter. 

We all know that operations is not easy. It is as complex as human brain. Every individual dealing with day to day operations has different views on how they want to present and interpret data. While some are big fans of vROps Dashboards, others believe in the approach of notifications which can alarm them on predictive and actual failures in the software defined datacenter. I use the term SDDC, because I believe the architecture supports both on-premise and off-premise environments. In other words, while your SOPs could differ in operating Private & Public clouds, the operations framework moreover remains the same. Yes, you might be dealing with a complete different set of objects and metrics when you are running on Amazon, however the generic human behavior towards operating SDDC remains unchanged.

While, I should be sitting with the AWESOME developers of the product and the Product Managers to celebrate this release, I am here in India working with a customer along with my buddy Iwan "e1" Rahabok to operationalize their Software Defined Datacenter. I am not complaining though, since I get to learn the most when I get requirements from a customer and we get into a heated discussion around operations management. The outcomes can be quite satisfying at the end of the day!!

So without further a do lets see what this release has to offer:


What's New?

vRealize Operations Manager 6.5 focuses on enhancing product scalability limits and troubleshooting capabilities.
  • Additional monitoring capabilities
    • Adds ability to increase memory and increase the scope of monitoring within the same environment.
    • Enables you to monitor larger environments with the same footprint through platform optimization.
    • XL size node enables you to monitor more objects and it processes more metrics.

  • Automatic upgrade of Endpoint Operations Agents:
    • The new Endpoint Operations Agent upgrade bundle allows you to automatically upgrade the agent through the vRealize Operations Manager user interface.

  • Enhanced troubleshooting capabilities:
    • Quickly correlates logs and metrics in context for any monitored object using Log Insight within vRealize Operations Manager.
    • Creates custom metric groups that enables you to focus on most relevant metrics

  • Improved collaboration:
    • Simplifies export and import of custom groups and eases dashboard sharing between different vRealize Operations Manager installations.
    • Understands Private Cloud costs and Public Cloud spends by accessing vRealize Business for Cloud from within vRealize Operations Manager.

General improvements
  • Removes scale limitations from Predictive DRS (pDRS).Predictive DRS enables vRealize Operations to provide long-term forecast metrics to augment the short-term placement criteria of the vSphere Distributed Resource Scheduler (DRS).

  • Default values for the following global settings have changed:
    • For Deleted Objects, it has changed from 360 hours to 168 hours
    • For Action History, it has changed from from 90 days to 30 days
    • For Symptoms/Alerts, it has changed from 90 days to 45 days
    These default settings are effective only for new vRealize Operations 6.5 installations.

  • Allows you to disable coloring in scoreboard widget.

  • The update process has been optimized. The update process might be up to 40% faster, depending on the size of the environment and the number of objects being monitored.


Here are some useful links to get you going: