When looking ahead at 2020, there’s a lot for z/OS performance analysts to consider. Mainframe transaction volumes continue to grow for most sites, and the size and complexity of the z/OS environment is increasing, yet, the number of deep z/OS infrastructure performance experts continues to shrink.
Without the right z/OS performance monitoring practices in place, these challenges can lead to excessive application downtime, lost revenue, customer frustration, burn-out, and more. Even after excluding the direct financial costs, once we consider the true cost of downtime it’s clear why avoiding or preventing service disruptions is the number 1 priority for most mainframe performance teams.
Narrowing down priorities for the year can be difficult, but these three top most lists:
- Ensure Optimal Application Performance and Availability (Zero Downtime)
- Reduce Mainframe Costs
- Improve Staff Efficiency and Resolve the Skills Gap
Utilizing the right strategic and technical plan for your performance monitoring can mean the difference between achieving these goals in 2020 or missing them.
Goal 1) Ensure Optimal Application Performance and Availability
When z/OS technical experts are looking to manage and monitor their z/OS infrastructure performance, it’s logical to seek out a real-time performance monitor. After all, doesn’t it make sense to want to know about a service disruption or application downtime as soon as it occurs?
When the only options are either knowing right away or after end-users report the issue, then yes, knowing right away is the better option. But knowledge about upcoming disruptions that can be prevented before they affect the end-user is much more valuable.
AIOps solutions use built-in expert knowledge about the hardware and a site’s specific workloads to identify potential issues before they ever impact application availability.
The image above, taken from IntelliMagic Vision for z/OS, shows an exception table for all warnings and exceptions for Coupling Facility and includes a prioritized rating of each issue with built-in recommendations.
This report required no coding or report building. The AIOps engine automatically identified and rated each of the performance warnings and exceptions, detailed them, and placed them in this high level report for the expert to analyze and deal with. Better yet, each of these reports comes with extensive built-in drill-downs to make root cause analysis far more efficient and pain-free.
This leads us to our first best practice.
Best Practice #1 Utilize a Predictive Analytics Solution to Eliminate Disruptions
Goal 2) Reduce Mainframe Costs
When it comes to reducing mainframe costs, usually the first culprit is MLC (Monthly License Charges), and rightly so. MLC costs consume up to 30% of some mainframe budgets and if left unchecked can skyrocket out of control.
There are many options available to lower or reduce a site’s MLC costs, from capping and MLC-specific solutions to simple tuning activities that just require the right visibility and knowledge of the critical areas.
From a z/OS performance management perspective, the real key is to lower costs without negatively impacting performance – and those two rarely cooperate with each other.
Our resident MLC expert Todd Havekost has written extensively on the effect that processor cache has on MLC costs. Add to the mix that Tailored Fit Pricing has now “Thrown the R4HA Out the Window” as our expert John Baker puts it, and finding the right way to lower costs just keeps getting more complicated.
Rather than having multiple tools or solutions to optimize or reduce MLC costs and another tool to monitor your z/OS performance, save yourself the money and additional headache and look for an end-to-end performance solution that provides built-in MLC visibility.
The three images above taken from IntelliMagic Vision for z/OS represent visibility that is critical to understanding what is causing the peaks that drive MLC costs.
Best Practice #2 Use a z/OS Monitoring Tool that Lets You Manage Performance AND Lower Costs
But MLC costs are not the only line item that can be lowered with the right visibility. Others include:
- Eliminating emergency hardware purchases
- Purchasing the right amount of hardware
- Purchasing the right amount of flash storage
- Rebalancing workloads and applications
- Avoiding costly mainframe outsourcing mistakes
Clear visibility into your z/OS infrastructure allows you to monitor its performance, but advanced analytical solutions can forecast your workload growth to ensure you’re only purchasing the right amount of storage that you actually need, rather than playing it safe and over purchasing.
Best Practice #3 Forecast Workload and Capacity Growth to Eliminate Guesswork from Storage Purchases
Goal 3) Overcome the z/OS Skills Shortage and Improve Staff Efficiency
Overcoming the mainframe performance and capacity skills shortage and improving IT staff efficiency is essential for the long-term viability of mainframe operations. Otherwise, it’s impossible to ensure optimal application performance.
Even with less heads to pay for, mainframe costs will likely go up rather than down due to the lack of efficiency and paying for likely service disruptions.
If you’ve attended a SHARE conference in the past 2-3 years then you know that there are now numerous initiatives underway to try and fill in the gap left behind by a retiring workforce of deep performance experts. These initiatives are working to hire and train a new generation of mainframe performance analysts, but the process is slow and doesn’t solve the issue of faster skill acquisition.
The fastest way to train the incoming workforce is to equip them with tools that allow them to easily visualize a new environment, pick up on the critical areas, and easily navigate through the data. In his white paper, 5 Key Attributes of an Effective Solution to the z/OS Performance Skills Gap, Todd Havekost writes that such a solution must be:
- Fast and Current
- Visual and Interactive
- Predictive and Contextual
- Versatile with Expanded Applications
- Cloud-based and Collaborative
Having a powerful tool not only trains incoming staff but makes even the deepest subject matter experts more efficient and effective with their time and energy. By eliminating the need for manual SAS and MXG coding or excessive report creation, and by making the sharing of reports and knowledge quick and easy, experts can instead spend the bulk of their time in reducing costs, optimizing performance, and preventing disruptions.
Best Practice #4 Ensure your z/OS Performance Monitor is Easy to Use and Set Up for Quick Skill-Acquisition
Best Practices for Monitoring Your z/OS Performance in 2020
2019 was a watershed year for how RMF (or CMF) and SMF data is used by mainframe performance and capacity teams. The capabilities and expectations for what a z/OS performance monitor can and should do has dramatically shifted. 2020 will solidify this momentum.
In this blog I covered the following Best Practices for z/OS Performance Monitoring:
- Predictive Analytics > Real-Time Monitor
- Use 1 solution with visibility into performance & MLC (and other) costs
- Save on unnecessary storage purchases by forecasting workload and capacity growth
- Make IT staff more efficient and easier to train with an easy to use performance solution
Our recent webinar, 2020 Vision into z/OS Infrastructure Operation and Availability, also covers additional best practices for ensuring efficient application availability from the z/OS infrastructure.
Subscribe to our Newsletter
Subscribe to our newsletter and receive monthly updates about the latest industry news and high quality content, like webinars, blogs, white papers, and more.
IntelliMagic zAcademy: An Online Educational Series for z/OS Professionals
Conferences are canceled, so we're bringing the conference to you! IntelliMagic zAcademy brings the education of your favorite z/OS conference to your new home office. Every week, a new online educational session.
Is this z/OS workload change normal?
Change Detection allows analysts to automatically detect workload and application changes, saving them countless hours of manual labor and intensive scrutiny while trying to determine the significance of changes
Troubleshoot High CPU and zIIP Utilization in WebSphere Application Server
A common problem performance analysts encounter is high CPU utilization on a server or application without the ability to identify the root cause of the problem quickly and easily.