George Dodson - 23 October 2017

This blog originally appeared as an article in Enterprise Executive.

Computer professionals have been interested in determining how to make computer applications run faster and determine the causes of slow running applications for more than 50 years. In the early days, computer performance was in some ways easy because electronic components were soldered in place. To understand what was happening at any point in the circuitry, we simply attached a probe and examined the electronic wave information on an oscilloscope.

Eventually, we were able to measure activity at key points in the computer circuitry to determine things like CPU Utilization, Channel Utilization and Input/Output response times. However, this method still had many shortcomings.

First, the number of probes was very small, usually less than 40. Secondly, this method gives no insight into operating system functions or application operations that may be causing tremendous overhead. And of course, when integrated circuits were developed, the probe points went away.

The Origins of Infrastructure Performance Information

In 1966 I joined an IBM team that was focusing on a better way to conduct benchmarks in what was then named an IBM Systems Center. Customers considering computer upgrades would come to our data center to determine how their programs would operate on newly released hardware. But it was simply not possible to host every customer in this way.

I was a part of a team of three who undertook the task to quantify the amount of work for specific customer jobs in terms of CPU and I/O activity, analyze that information, and then predict results using simulation models. These simulations allowed us to model the impact on service levels of different CPU speeds, workload variations, I/O device speeds, etc.

We first determined the points in the systems software to measure. Then we quickly discovered that the captured data was extremely helpful, becoming even more important than the predictive models we had developed. We had, for the first time, infrastructure performance information that no one else had.

For example, we knew immediately that I/O Channels were busy doing a lot of things besides reading and writing user data. The exact overhead activities to execute commands were below the level of data we were capturing, but it was still amazing to see the overhead contribution to I/O service times.

We were successful in building monitoring and modeling capabilities for both MVT and MFT that were variants of IBM’s OS/360 operating system. The models were also very successful in predicting the impact of making various changes to the environments.

We were tracing systems activity as it happened, and the amount of data being captured was significant. Our team and IBM Systems Engineers used this capability to assist customers in evaluating their systems and applications. We also found a newer way to evaluate computer performance by using software to sample computer stats, first involving CPU utilization, then moving on to I/O and other activity.

At this time, Boole & Babbage developed a product to provide visibility into this data, and the team I was then leading at IBM developed a very similar capability, but it was only available to assist mainframe hardware pre-sales teams. Our team developed this capability for multiple versions of operating systems for IBM marketing team use first, and then released some versions for customers to use for their own analysis.

MVS, Performance Monitors, and Resource Management Facility (RMF)

By the early 70’s, IBM developed MVS that included the Systems Resource Manager. With MVS’ more complex system, it became obvious that a performance monitor would be needed.

The first monitor developed was known as MF/1, which provided some basic information such as CPU utilization and some I/O information. As MVS continued to grow, similar functionality that my team had developed for other IBM systems was desired for MVS as well and we worked with the MVS and later z/OS developers to add many features of SVS/PT to MF/1 – which was named RMF – Resource Management Facility – when it was finally announced as a program product in 1974.

After being announced as a product in 1974, RMF was further expanded to provide more capabilities such as RMF Monitor 2 and RMF Monitor 3. These provided real time insight into the internal workings of z/OS to help understand and manage the performance of the z/OS infrastructure.

The value of the RMF performance measurement data has been proven over the decades as it, or a compatible product from BMC named CMF, is used in every mainframe shop today. Many new record types have been added in recent years as the z/OS infrastructure capabilities continue to evolve.

Systems Management Facility (SMF)

A related product – Systems Management Facility or SMF – was originally created to provide resource usage information for chargeback purposes. SMF captured application usage statistics, but was not always able to capture the entire associated system overhead.

Eventually, SMF and RMF were expanded to capture detailed statistics about all parts of the mainframe workloads and infrastructure operation, including details about third party vendor devices such as storage arrays. RMF and SMF now generate what is likely the most robust and detailed performance and configuration data of any commercial computing environment in the data center.

Performance Tools Expand

As the data sources to report on the performance of the workloads and the computer infrastructure grew, different performance tools were created to display and analyze the data. The information in the data was very complex and the total amount of data captured is overwhelming, creating challenges to identify performance problems.

Typically, this requires analysts who have extensive insight into the specific infrastructure areas being analyzed, and an understanding of how they respond to different applications workloads. As applications have grown more complex, more real-time, with more platforms and components involved, the performance analysis task also has become more difficult.

To deal with the growing amount and complexity of data, and the need for better performance analysis, data warehouses to capture and store the data were introduced in the 80’s and 90’s. Vendors developed many different approaches to capturing data, modifying it to get it onto the same time basis and then create reports from it to highlight what was happening to the systems and applications.

As noted earlier, RMF and SMF have become the single point of data collection for applications, systems, sub-systems and components like special processors and storage subsystems. Mining this data and being able to do it with a high level of automation is a significant challenge. The performance database and reporting approaches developed decades ago that are still commonly in use today just do not provide the intelligence needed.

The Key to z/OS Performance Management

The key to managing z/OS environments is choosing the necessary RMF and SMF data, then having analysis capabilities to automatically highlight performance issues without having to spend days analyzing each day’s data. As everyone knows there are many products to gather and report on the information in this data, and these products have wide usage. However, finding the cause of performance or availability issues is very complex.

The key to finding the causes is to have a way to correlate problem areas without having to have analysts to pour over thousands of pages of graphs or reports.

IntelliMagic’s entry into first the storage subsystem area, then moving up to evaluate entire systems, has made a significant step forward in automating the extremely time-consuming effort of data analysis.

Today’s systems have become so complex that Artificial Intelligence (AI) is an absolute must. A slow response time or slow application throughput may have significant penalties on service level agreements, or in Cloud implementations, significant challenges to even determine the causes of poor performance.

As an IT professional involved in the genesis and evolution of performance measurement data and analysis on the mainframe platform, it has been a pleasure to see the effectiveness of IntelliMagic’s approach to finding what is relevant to the massive amount of data that is generated today.

About George

George Dodson has a long and storied history of mainframe performance & capacity planning including industry recognition. He retired after 30 years at IBM and went on to lead many consulting efforts at Fortune 100 companies.

Related

Webinar

Best Practices for z/OS Application Infrastructure Availability in 2019

This webinar demonstrates best practices for ensuring efficient application availability from the z/OS infrastructure from strategic and technical viewpoints.

Watch Webinar
Blog

Is Your Car or Mainframe Better at Warning You?

Early warning can help avoid problems before they occur or impact production, and help fix a minor issue before it gets worse.

Read more
Webinar

RMF/SMF Analysis - Get Ahead of the Curve with Statistical Analysis

This webinar demonstrates how applying statistical algorithms to meaningful metrics will automatically identify significant changes to the z/OS workloads and applications.

Watch Webinar

Go to Resources

This article's author

George Dodson
IT Consultant
Read George's bio

Share this blog

Subscribe to our Newsletter

Subscribe to our newsletter and receive monthly updates about the latest industry news and high quality content, like webinars, blogs, white papers, and more.