Brent Phillips -

What are the top 5 million things you need to do today to avoid application infrastructure performance problems? Because that’s usually what it comes down to.

In a perfect world and a perfect environment, all IT infrastructure operation issues would be proactively identified and prevented well before any application end user ever felt any production impact. The reality is often much crueler. Too often performance teams are overworked and understaffed, running from one production fire to the next; always trying to solve problems in real time; always in crisis mode.

6 Steps to Proactively Prevent Any Potential Issues

But perhaps there comes a day with no production fire (that you’ve been alerted to), and you want to devote your time to proactively prevent any possible upcoming issues. Great. Now all you have to do (every day) is:

  1. Collect the millions (or billions) of performance, capacity, and configuration data points from the various infrastructure components involved
  2. Segment out from the shared infrastructure resources the portion that represent the specific application in question
  3. Perform some calculations on related metrics to produce valuable new metrics and ensure that you have visibility into root causes of problems, not just symptoms of problems
  4. Evaluate the metrics in the context of the infrastructure that is running them and see if any of them are out of range of what is normal and good for the type of workload, the specific capabilities and capacity of the infrastructure based on expert knowledge of what it can do, the best practices for the infrastructure, and check for any misconfiguration or errors
  5. For out of range items look to see how the metrics in question compare to other recent periods to see when the changes started
  6. Produce the path of how the root cause metrics are out of range and which symptoms (response times, etc.) are going to be negatively affected if the underlying conditions are not proactively addressed

That’s it. That is all you have to do every day in order to be proactive in identifying issues and avoiding them before they become real problems. That, and avoid being bored to death when 99.9% of the metrics are not out of range. Then you only have to determine which of the 1/10 of 1% are the important ones to look at? Easy, right?

Don’t Steal the Computer’s Job

The question isn’t really whether a human can do the steps listed above. It’s why should they even try? Computers have been developed to do the work that humans should never have to do – process millions upon millions of data points and automatically understand what’s important, how it’s relevant to the data around it, and what the level of severity is.

This is what they are good at, and instead of fighting to keep these jobs for ourselves (while at the same time never having the time or aptitude to get to it), we should devote our time to tasks that humans are built for.

If you wanted to dig a ditch, you wouldn’t use a spoon just because it would keep you busy. Today you wouldn’t even use a shovel! Technology was developed as a tool to enhance our capabilities and make better use of our time – Artificial Intelligence is the next evolution of tools designed to help us.

Taking Advantage of AI for your IT Operations

When utilized correctly, and combined with human expert knowledge, Artificial Intelligence will dramatically simplify and streamline complex processes and tasks for performance and capacity teams. Teams who are encouraged to be proactive will be able to take advantage of the numerous benefits AI offers:

  • Automate data processing (and all 6 tasks from the start of this blog)
  • Identify potential problems predictively
  • Offer recommendations to resolve existing issues
  • Equip current staff to be more productive, and streamline training for new staff
  • Optimize environments for performance and cost savings, and of course,
  • Avoid application infrastructure performance problems

Almost all new technology meets resistance at its offset until it crosses the threshold from “that is never going to happen” to “oh, everyone is already using that.” The alternative may be that you can get it done (eventually) without the assistance of the technology, but what would you be able to do with that extra time? In the meantime, I think there are about a million things that need to be done to prevent the next issue.

5 Key Attributes of an Effective Solution to the z/OS Performance Skills Gap

The skills shortage for z/OS performance analysts and capacity planners has left many organizations struggling to ensure availability. Modernized analytics can help solve the skills shortage by accelerating the acquisition of skills for z/OS performance analysts and capacity planners.

This article's author

Brent Phillips
Worldwide IBM Z Performance Evangelist
Read Brent's bio

Share this blog

Related Resources


What's New with IntelliMagic Vision for z/OS? 2024.2

February 26, 2024 | This month we've introduced changes to the presentation of Db2, CICS, and MQ variables from rates to counts, updates to Key Processor Configuration, and the inclusion of new report sets for CICS Transaction Event Counts.

Read more

What's New with IntelliMagic Vision for z/OS? 2024.1

January 29, 2024 | This month we've introduced updates to the Subsystem Topology Viewer, new Long-term MSU/MIPS Reporting, updates to ZPARM settings and Average Line Configurations, as well as updates to TCP/IP Communications reports.

Read more

A Mainframe Roundtable: The SYSPROGS | IntelliMagic zAcademy

Discover the vital role of SYSPROGs in the mainframe world. Join industry experts in a concise webinar for insights and strategies in system programming.

Watch Webinar

Go to Resources

Book a Demo or Connect With an Expert

Discuss your technical or sales-related questions with our mainframe experts today