Don’t ignore that alarm!
Ignore an alarm? Why would someone do that? Answer: because some tools send too many!
To avoid getting overloaded with meaningless alarms, it is important to implement best practices. The first best practice is to implement a software solution that is intelligent. It should:
- Understand the limitations of your hardware
- Take into consideration your particular workload
- Let you know that you are heading for a problem before the problem begins
- Eliminate useless alarms
If you have followed this first best practice, congratulations! You are headed in the right direction.
Now you need to know what other best practices you should follow to get the most out of your software investment.
The following is a list of best practices that you should be aware of when implementing a software solution that provides automated alarms.
Focus on Your Lead Metrics
Not all metrics are equal. IntelliMagic has sifted through dozens of vendor metrics and created many new ones. Our goal is to highlight performance information that is most valuable in preventing issues. Metrics can be categorized as Lead metrics and Lag metrics. Lead metrics are the best predictors of problems coming your way. Some tools tend to focus on lag measures, for example, response time. But by the time a response time alarm is sent out, your applications are already feeling the pain! IntelliMagic Vision guides you to focus on the Lead metrics. For example, front-end adapter and back-end device utilization. When these Lead metrics cross a warning threshold, and you get notified, you have time to take corrective action before the applications suffer. Understanding the difference between Lead and Lag metrics is key to prevent issues, and IntelliMagic makes this easy.
Another best practice is to look at ratings. Ratings are a numerical way to indicate how significant the issue is that you are getting an alarm for. IntelliMagic Vision produces ratings for many performance metrics, both lag and lead. The IntelliMagic Vision rating scale goes from 0.0 to 3.0. The more a threshold is crossed, the higher the rating will be for that metric. To gain a deeper understanding of the intensity of an alarm, pay attention to its numerical rating. This will also allow you to prioritize your work. When you have two critical production systems, one that is generating an alarm with a rating of 2.4 and one that generates an alarm with a rating of 0.8, you will first want to give your attention to the first system.
Maintaining a balanced load over the available hardware components is the key to getting the maximum value out of your storage hardware investment. And an imbalance left unchecked quickly becomes a performance bottleneck and hot-spot.
Charts that show imbalances at a glance will enable you to keep your resources balanced as the load on your storage systems increases over time. One example is a balance chart for the Front End Adapters. Often operation teams provision storage purely based on how much capacity is requested. It is also important to make your decisions based on the application IO performance profile. IntelliMagic’s Front End Adapter balance charts can be used to view easily which adapters are under-utilized and therefore where the next provisioning should take place.
It is important to know your environment. This involves being familiar with the trends in your workloads. There are two important things to consider. First, how is your data and workload changing over time? What is the trend in the IOPs, MBs, response times, and storage capacity? The second is to understand how the lead measures are changing over time. There are several IntelliMagic Vision features that will allow you to perform both types of trending. Trending your data is a best practice in understanding your environment.
Knowing your environment requires a regular review of how it is doing. It’s important to have a storage performance solution that will reliably collect and interpret measurement data all day, every day. And it’s important to have a process in place to review trends and daily exception reports of the lead indicators, rather than wait for alarms or lag indicators to highlight performance hot spots that have already developed.
IntelliMagic Vision comes with default thresholds. These default settings are based on our extensive years of storage performance modeling experience. Part of the thresholds are hardware-based. Those take into account the configuration and architecture of your storage systems to reflect the performance that can be expected given the particular characteristics of your application workloads, such as IO size or cache hit percentage. However, other thresholds are designed for you to adjust to your needs and environment, for instance, to reflect SLAs or to reflect your expectations for certain “loved applications.” These thresholds should be customized and regularly reviewed and tuned.
IntelliMagic experts are always excited to help you understand best practices for your environment. If you have any questions about how to implement these or if you are interested in learning more about IntelliMagic solutions, please contact us.
What's New with IntelliMagic Vision for z/OS? 2024.2
February 26, 2024 | This month we've introduced changes to the presentation of Db2, CICS, and MQ variables from rates to counts, updates to Key Processor Configuration, and the inclusion of new report sets for CICS Transaction Event Counts.
A Mainframe Roundtable: The SYSPROGS | IntelliMagic zAcademy
Discover the vital role of SYSPROGs in the mainframe world. Join industry experts in a concise webinar for insights and strategies in system programming.
What's New with IntelliMagic Vision for z/OS? 2024.1
January 29, 2024 | This month we've introduced updates to the Subsystem Topology Viewer, new Long-term MSU/MIPS Reporting, updates to ZPARM settings and Average Line Configurations, as well as updates to TCP/IP Communications reports.