Morgan Oats - 28 May 2021

z/OS performance analysts must deal with a lot of complicated issues that make their difficult job harder. One of those is trying to determine whether a specific z/OS workload or utilization change is a one-time anomaly that likely won’t result in any availability issue, or if it is a change that needs to be monitored or promptly addressed before a problem occurs.

Because the workload and utilization of the z/OS infrastructure components vary over time, it is not straightforward to determine if a specific change can be considered “normal” or if it is an indication of a (future) problem. With IntelliMagic Vision’s latest rollout of z/OS Anomaly Detection, or Change Detection, a performance analyst can easily accomplish this task.

Anomaly Detection allows performance analysts to automatically detect workload and application changes, saving them countless hours of manual labor and intensive scrutiny while trying to determine the significance of changes, such as:

  • When a new application is brought online
  • When a new version of an existing application behaves differently
  • When there is a problem with a started task
  • When important workloads or transactions deviate from the norm

Automatically Detect Significant z/OS Workload Changes

For hundreds of important metrics in the z/OS infrastructure it is now possible to automatically detect significant changes on any selected day, as compared to a reference period of 30 days. This includes important changes in hardware, middleware and other z/OS components.

Anomaly Detection calculates not only the averages for every z/OS infrastructure metric, but also the standard deviations and compares the current values with these statistics. Statistical approaches help save CPU time and MSU’s by being able to quickly see when something new comes online or when new application versions are less efficient.

Statistical approaches help save CPU time and MSU’s by being able to quickly see when something new comes online or when new application versions are less efficient.

And the added benefit is the contextual information available through simple drilldowns.

With this new capability, the question, “Is this workload change normal?” can be investigated and answered with relative ease. For each metric, we can investigate workload changes, compare them to our standard reference period, and easily gauge the significance of each change without the need to manually code or run statistical analysis on each metric.

Figure 1 below shows an example of an Anomaly Detection report for “DSS Changes.”

Disk Storage System zOS Anomaly Detection Table

Figure 1: Disk Storage System Anomaly Detection Table


The table represents the standard deviation of change from a single day compared to a reference period of 30 days. For each metric (e.g. I/O Rate) two values are shown: the measured value for the selected single day and the difference in number of standard deviations (std devs), also known as the z-score, as compared to the reference period.

A positive std devs value means the selection period’s average is higher than the reference period’s average, and a negative value means the selection’s average is lower.

  • A value between -2 and +2 indicates that the “I/O Rate” for the selection period is in line with the “I/O Rate” pattern during the reference period.
  • A value that is between -3 and -2 or between +2 and +3 indicates a notable change and is marked as yellow. This may merit an investigation.
  • If the value is -3 or less, or +3 or more, the “I/O Rate” for the selection period is substantially different. These values are marked as red.

As indicated by the red filled cells, the change for I/O Rate, Read and Write Throughput, and Read Hit Percentage for the last DSS is so remarkable that it is wise to investigate.

Investigating Significant z/OS Workload Changes with Drilldowns

As is the case throughout IntelliMagic Vision, to further investigate a warning or exception found in one of the reports or tables, you can drilldown with a simple click on the report, line, metric, or exception you wish to investigate.

From Figure 1, we would like to investigate the DSS with the serious changes highlighted. We do this by clicking on the row with the exception to show the local menu. This takes us to Figure 2: a set of small charts representing the different metrics in the Anomaly Detection table.

Drilldown to overview of all metrics for DSS with highlighted changes

Figure 2: Drilldown to overview of all metrics for DSS with highlighted changes


We can also further zoom into one of these metrics by clicking on the chart as indicated in Figure 3.

Drilldown to “Reference Period” - Selection of one metric

Figure 3 : Drilldown to “Reference Period” – Selection of one metric


Figure 3 shows us three curves:

  • The yellow straight line represents the overall average for the whole reference period of 30 days.
  • The blue curve shows the averages per day. Note that you can often see a weekly pattern here.
  • The red curve shows the measured values for each time interval, for instance 15 minutes.

Outside of the drilldown shown above, there are many other options for investigating, comparing, and drilling down into detected changes with a very high or very low z-score.

Answering the Question: “Is this change normal?”

With the addition of z/OS Anomaly Detection to IntelliMagic Vision, performance analysts now have a powerful new tool under their belt.

For years, IntelliMagic’s built-in-expert-knowledge has been empowering analysts to proactively prevent availability and performance exceptions from occurring in their z/OS environments. With the introduction of Anomaly Detection, analysts can combine the proactive with near-real-time change detection as soon as the z/OS metrics are loaded.

When it comes to answering, “Is this workload or utilization change normal, or not?” the answer is now just a few clicks away from being answered.

This article's author

Morgan Oats
More from Morgan

Share this blog

z/OS Performance Monitoring Best Practices for 2021

MQ Statistics - Learning From SMF

This article is designed to introduce you to the types of insights that are available through SMF data with a focus on the SMF 115 MQ Statistics data. After reading, you will have a better understanding of how MQ functions.

Related Resources


Is Some Automation Unnecessary Noise or Just Lower Priority Work?

Computers are great at doing what we tell them. They don’t get tired, bored, or complain about the endless work we have them doing. But sometimes automation is running in not so obvious places. Learn some practical guidance for managing those workloads.

Read more

What’s New? zEDC and the Nest Accelerator Unit (NXU)

Advancements similar to the NXU are likely to become more commonplace since raw processor speeds have plateaued. Specialized processing drives new metrics to manage an already complex system. What’s affected? How will you keep up?

Read more

Improving IT and Business Organization Alignment through Dynamic Collaboration

Silos have the benefit of grouping logical functional areas within the larger organization in order to provide deep and comprehensive support. Learn 4 ways to move your teams from unilateral to collaborative communication.

Read more

Go to Resources