Brent Phillips - 6 August 2018

What are the top 5 million things you need to do today to avoid application infrastructure performance problems? Because that’s usually what it comes down to.

In a perfect world and a perfect environment, all IT infrastructure operation issues would be proactively identified and prevented well before any application end user ever felt any production impact. The reality is often much crueler. Too often performance teams are overworked and understaffed, running from one production fire to the next; always trying to solve problems in real time; always in crisis mode.

6 Steps to Proactively Prevent Any Potential Issues

But perhaps there comes a day with no production fire (that you’ve been alerted to), and you want to devote your time to proactively prevent any possible upcoming issues. Great. Now all you have to do (every day) is:

  1. Collect the millions (or billions) of performance, capacity, and configuration data points from the various infrastructure components involved
  2. Segment out from the shared infrastructure resources the portion that represent the specific application in question
  3. Perform some calculations on related metrics to produce valuable new metrics and ensure that you have visibility into root causes of problems, not just symptoms of problems
  4. Evaluate the metrics in the context of the infrastructure that is running them and see if any of them are out of range of what is normal and good for the type of workload, the specific capabilities and capacity of the infrastructure based on expert knowledge of what it can do, the best practices for the infrastructure, and check for any misconfiguration or errors
  5. For out of range items look to see how the metrics in question compare to other recent periods to see when the changes started
  6. Produce the path of how the root cause metrics are out of range and which symptoms (response times, etc.) are going to be negatively affected if the underlying conditions are not proactively addressed

That’s it. That is all you have to do every day in order to be proactive in identifying issues and avoiding them before they become real problems. That, and avoid being bored to death when 99.9% of the metrics are not out of range. Then you only have to determine which of the 1/10 of 1% are the important ones to look at? Easy, right?

Don’t Steal the Computer’s Job

The question isn’t really whether a human can do the steps listed above. It’s why should they even try? Computers have been developed to do the work that humans should never have to do – process millions upon millions of data points and automatically understand what’s important, how it’s relevant to the data around it, and what the level of severity is.

This is what they are good at, and instead of fighting to keep these jobs for ourselves (while at the same time never having the time or aptitude to get to it), we should devote our time to tasks that humans are built for.

If you wanted to dig a ditch, you wouldn’t use a spoon just because it would keep you busy. Today you wouldn’t even use a shovel! Technology was developed as a tool to enhance our capabilities and make better use of our time – Artificial Intelligence is the next evolution of tools designed to help us.

Taking Advantage of AI for your IT Operations

When utilized correctly, and combined with human expert knowledge, Artificial Intelligence will dramatically simplify and streamline complex processes and tasks for performance and capacity teams. Teams who are encouraged to be proactive will be able to take advantage of the numerous benefits AI offers:

  • Automate data processing (and all 6 tasks from the start of this blog)
  • Identify potential problems predictively
  • Offer recommendations to resolve existing issues
  • Equip current staff to be more productive, and streamline training for new staff
  • Optimize environments for performance and cost savings, and of course,
  • Avoid application infrastructure performance problems

Almost all new technology meets resistance at its offset until it crosses the threshold from “that is never going to happen” to “oh, everyone is already using that.” The alternative may be that you can get it done (eventually) without the assistance of the technology, but what would you be able to do with that extra time? In the meantime, I think there are about a million things that need to be done to prevent the next issue.

Related Resources

Blog

Reporting on z/OS Mainframe Application Performance – Part 1

Intelligent reporting on mainframe applications is very helpful to understand application performance and behavior on the mainframe, and how the applications are using your systems.

Read more
Video

Integrating z/OS Performance Management with Splunk & Enterprise Dashboards

IntelliMagic Vision can export data into a Splunk-ready format, after which it can be imported into Splunk. This means not much data is ingested into Splunk, which can be a significant cost-saver.

Watch video
Video

Forecasting Usage and Capacity Growth

Processors are expensive, and that is why it is crucial to have an accurate understanding of how much capacity you are using and have left before you run out.

Watch video

Go to Resources

5 Key Attributes of an Effective Solution to the z/OS Performance Skills Gap

The skills shortage for z/OS performance analysts and capacity planners has left many organizations struggling to ensure availability. Modernized analytics can help solve the skills shortage by accelerating the acquisition of skills for z/OS performance analysts and capacity planners.

This article's author

Brent Phillips
Managing Director, Americas
Read Brent's bio

Share this blog

Subscribe to our Newsletter

Subscribe to our newsletter and receive monthly updates about the latest industry news and high quality content, like webinars, blogs, white papers, and more.