When I was growing up, long car rides were a bit challenging due to our car’s alerting system: smoke, steam, horrible clunking noises, or dead silence. Everything was great until Betsy (my mom always named our car Betsy) did not move anymore. Then we had to get the car to a mechanic who was an expert at making us feel ignorant and took a lot of our money to fix something simple (usually).
Then cars started getting better at alerting the operator about simple problems, but you still had to take the car to a mechanic to fix the problem. Today, between YouTube, Google, and Internet forums, you can often get the steps it takes to resolve a lot of these alerts for a whole lot less money; however, there is still more that needs to be done between getting an alert from your car and solving/fixing the issue.
What if your car could alert you to an issue, do an Internet search for you, and send a fixit video to your smartphone before you could even get to a safe place to check your smartphone? That kind of intelligence would be convenient. The same principle applies to alerts you get from your z/OS Operating System.
When I started working in Operations, when we still called it “MVS”, an Operator would see an alert and call me (usually at night). I would sometimes have to drive into the office or call another Systems Programmer, analyze the alert, and act upon it. What if now, the alert could automatically perform root cause analysis and send supporting reports to your smartphone?
One of the major problems with alerts in IT is that digitally oriented machines are generating so many that the Operators become desensitized to them. Projects to “clean up” alerts may end up filtering out necessary ones. I know of one project that was started to reduce alerts, which was intended to improve the alerting, and it created more problems than it solved. Many customers even ask for a single pane of glass to contain alerts and want them to be smarter. This can wind up being a single glass of pain, that adds no value if alerts don’t lead to actionable solutions.
What Does that z/OS Alert Do for You?
Alerts from z/OS are common. Some come with software packages, some are part of the operating system, and some are written by Systems Programmers or your Developers.
Perhaps a loved service class that was missing its goal caused a business impact at one point in time. Someone probably wrote some automation to alert the Operator about that. Maybe a batch job missing its window triggers an alert. Paging could issue alerts. There are many scenarios where alerts cause someone to act and investigate the alert manually.
So when you get an alert, what does that alert do for you? Does it just scratch the surface and require further manual analysis? Whenever you hear or see the word “manually”, it can be synonymous with “expensive”, “difficult”, and “laborious”.
What is meant to help (the alert) and benefit your business almost always results in a lot more work just to understand if the alert is valid and important; not to mention that there is always much more analysis to perform in order to understand root cause.
Alerts should do more than cause more work. Alerts should be part of the solution and not part of the problem when monitoring and maintaining z/OS. If z/OS or your z/OS automation alerts you to something, it should also get you very close to solving that something.
Do not settle for complicated, poor, or confusing alerts. When you see or hear about an alert, ask the question, “What solution does the alert suggest?”
Part 2 of this blog, Automating Analysis of z/OS Alerts, discusses how you can automatically take that next step to make alerting even more valuable.
Predictive Intelligence for z/OS Systems Infrastructure
The goal of this white paper is to show you how to apply predictive intelligence to your z/OS Systems infrastructure analysis so you can avoid costly disruptions, empower your IT staff, and optimize your environment.
RMF/SMF Analysis - Get Ahead of the Curve with Statistical Analysis
This webinar demonstrates how applying statistical algorithms to meaningful metrics will automatically identify significant changes to the z/OS workloads and applications.
Simplify Mainframe Performance Management
IntelliMagic Vision simplifies mainframe performance management and allows analysts to examine their entire z/OS environment in a glance and quickly prevent or resolve application service disruptions.
Estimating Storage System Capabilities Should not be a Risky Business!
If you want a useful headroom metric you need to define it properly.