Morgan Oats - 28 May 2020

z/OS performance analysts must deal with a lot of complicated issues that make their difficult job harder. One of those is trying to determine whether a specific z/OS workload or utilization change is a one-time anomaly that likely won’t result in any availability issue, or if it is a change that needs to be monitored or promptly addressed before a problem occurs.

Because the workload and utilization of the z/OS infrastructure components vary over time, it is not straightforward to determine if a specific change can be considered “normal” or if it is an indication of a (future) problem. With IntelliMagic Vision’s latest rollout of Change Detection, a performance analyst can easily accomplish this task.

Change Detection allows performance analysts to automatically detect workload and application changes, saving them countless hours of manual labor and intensive scrutiny while trying to determine the significance of changes, such as:

  • When a new application is brought online
  • When a new version of an existing application behaves differently
  • When there is a problem with a started task
  • When important workloads or transactions deviate from the norm

Automatically Detect Significant z/OS Workload Changes

For hundreds of important metrics in the z/OS infrastructure it is now possible to automatically detect significant changes on any selected day, as compared to a reference period of 30 days. This includes important changes in hardware, middleware and other z/OS components.

Change Detection calculates not only the averages for every z/OS infrastructure metric, but also the standard deviations and compares the current values with these statistics. Statistical approaches help save CPU time and MSU’s by being able to quickly see when something new comes online or when new application versions are less efficient.

Statistical approaches help save CPU time and MSU’s by being able to quickly see when something new comes online or when new application versions are less efficient.

And the added benefit is the contextual information available through simple drilldowns.

With this new capability, the question, “Is this workload change normal?” can be investigated and answered with relative ease. For each metric, we can investigate workload changes, compare them to our standard reference period, and easily gauge the significance of each change without the need to manually code or run statistical analysis on each metric.

Figure 1 below shows an example of a Change Detection report for “DSS Changes.”

Disk Storage System Change Detection Table

Figure 1: Disk Storage System Change Detection Table

 

The table represents the standard deviation of change from a single day compared to a reference period of 30 days. For each metric (e.g. I/O Rate) two values are shown: the measured value for the selected single day and the difference in number of standard deviations (std devs), also known as the z-score, as compared to the reference period.

A positive std devs value means the selection period’s average is higher than the reference period’s average, and a negative value means the selection’s average is lower.

  • A value between -2 and +2 indicates that the “I/O Rate” for the selection period is in line with the “I/O Rate” pattern during the reference period.
  • A value that is between -3 and -2 or between +2 and +3 indicates a notable change and is marked as yellow. This may merit an investigation.
  • If the value is -3 or less, or +3 or more, the “I/O Rate” for the selection period is substantially different. These values are marked as red.

As indicated by the red filled cells, the change for I/O Rate, Read and Write Throughput, and Read Hit Percentage for the last DSS is so remarkable that it is wise to investigate.

Investigating Significant z/OS Workload Changes with Drilldowns

As is the case throughout IntelliMagic Vision, to further investigate a warning or exception found in one of the reports or tables, you can drilldown with a simple click on the report, line, metric, or exception you wish to investigate.

From Figure 1, we would like to investigate the DSS with the serious changes highlighted. We do this by clicking on the row with the exception to show the local menu. This takes us to Figure 2: a set of small charts representing the different metrics in the change detection table.

Drilldown to overview of all metrics for DSS with highlighted changes

Figure 2: Drilldown to overview of all metrics for DSS with highlighted changes

 

We can also further zoom into one of these metrics by clicking on the chart as indicated in Figure 3.

Drilldown to “Reference Period” - Selection of one metric

Figure 3 : Drilldown to “Reference Period” – Selection of one metric

 

Figure 3 shows us three curves:

  • The yellow straight line represents the overall average for the whole reference period of 30 days.
  • The blue curve shows the averages per day. Note that you can often see a weekly pattern here.
  • The red curve shows the measured values for each time interval, for instance 15 minutes.

Outside of the drilldown shown above, there are many other options for investigating, comparing, and drilling down into detected changes with a very high or very low z-score.

Answering the Question: “Is this change normal?”

With the addition of Change Detection to IntelliMagic Vision, performance analysts now have a powerful new tool under their belt.

For years, IntelliMagic’s built-in-expert-knowledge has been empowering analysts to proactively prevent availability and performance exceptions from occurring in their z/OS environments. With the introduction of Change Detection, analysts can combine the proactive with near-real-time change detection as soon as the z/OS metrics are loaded.

When it comes to answering, “Is this workload or utilization change normal, or not?” the answer is now just a few clicks away from being answered.

This article's author

Morgan Oats
Marketing
More from Morgan

Share this blog

z/OS Performance Monitoring Best Practices for 2020

MQ Statistics - Learning From SMF

This article is designed to introduce you to the types of insights that are available through SMF data with a focus on the SMF 115 MQ Statistics data. After reading, you will have a better understanding of how MQ functions.

Related Resources

Blog

IntelliMagic zAcademy: An Online Educational Series for z/OS Professionals

Conferences are canceled, so we're bringing the conference to you! IntelliMagic zAcademy brings the education of your favorite z/OS conference to your new home office. Every week, a new online educational session.

Read more
Blog

4 Benefits of Implementing TS7700’s New Microcode Compression

Implementing the TS7700 microcode-based compression algorithms can have benefits beyond better compression. The CPU algorithms allow host jobs to run faster and improve replication time.

Read more
Video

Easy Dashboard and Report Sharing

Sharing data is important, but sharing analyzed data is vital and much more meaningful. Watch this video to see how you can quickly and easily share any of your important reports and analyzed data with others on your team.

Watch video

Go to Resources