2018 is gearing up to be a watershed year for z/OS performance and capacity professionals.
Industry analysts have been talking for some years now about Artificial Intelligence (AI) and the role it will play in our work. But what that truly means, and its value in day-to-day operations has not yet been understood or realized by most professionals in this field.
There are many different types of AI, but not all are useful in making the computer do the kind of infrastructure performance and availability health assessment work that is no longer feasible for human analysts to proactively do every day. But when properly designed and deployed, it has proven very effective to implement automated, AI-driven decision making about what all the data means for identifying current or near-term performance problems and their root-causes.
The reason the computer can be more effective at this is that it is far more efficient than humans at continuously assessing how the application workloads are complying with hundreds or thousands of the most common issues that cause service disruptions on the specific infrastructure components running the workloads.
Answering for example, what z/OS best practices are indicating performance risk or problems, or what z/OS components are nearing saturation or have lost redundancy or are being used inefficiently? This automated application of domain-specific expert knowledge enables the human analysts to focus on the most important issues and root-causes that are, or will, affect the required application service levels.
2018 Predictions: AI and the IT Infrastructure
Applying AI techniques to the IT infrastructure operations data has been proven effective already for quite some time, at least in IntelliMagic solutions. Based on our experience in the market, as well as comments in the press and by industry analysts, we expect 2018 to be a watershed year in terms of mainstream recognition of the benefits of using AI to operate the IT infrastructure for optimal service levels.
In the bigger picture, this modernized, AI-driven analytics approach addresses issues such as:
- Enabling experienced staff to get the answers they need far more quickly
- Training new staff and bringing them up to speed sooner
- Analyzing and interpreting vast amounts of data
- Predictively detecting areas that represent performance risk
- Identifying inefficiency that helps safely reduce costs
- Quicker understanding of the IT operations data sources from new infrastructure technology
Closing the Performance and Capacity Skills Gap
Many organizations are hiring new staff to complement their deep z/OS performance and capacity planning experts that are due to retire in the coming years. Yet the skills required take years to develop, and in the meantime, the team must deliver continuous availability for the production applications. Solutions with deep, platform-specific expert-knowledge that are accessible by the algorithms facilitate faster learning about what is important, as well as showing what the sometimes-obtuse root-causes of the more easily visible performance problem symptoms are.
Augmenting Human RMF/SMF Data Analysis with Artificial Intelligence
Manual, proactive analysis of vast amounts of performance metrics is not effective or feasible with the complexity and scope of the infrastructure using the limited human resources most teams have today. Instead, teams typically dig into the data only after performance issues arise. Inviting “AI to the team” enables the entire team to be more productive, more quickly.
Predictive & Preventative Performance Intelligence
Organizations do not need more reports; they already have more than enough for their staff to look at. What they need is refined intelligence about what is important in all the data and what it means for the performance and capacity and efficiency of the infrastructure. Responding quickly to application availability disruptions is fast becoming too expensive and unreliable, and even real-time monitors are too late to avoid the production problem.
The need for proactively predicting and preventing service disruptions will soon become a fundamental requirement for all organizations – not just the largest financial institutions. Only AI technology that utilizes platform-specific expert domain knowledge can provide effective predictive capabilities with minimal false positives (alerting about unimportant issues) and without false negatives (missing the important problems).
Reducing Costs without Impacting Performance
Finding ways to reduce ever-rising costs has always been a priority for organizations, but not at the expense of performance and availability. AI-driven analysis can automatically and continuously assess whether common inefficiencies have arisen in the dynamic infrastructure operation.
Keeping up with Modern Technologies
The z/OS infrastructure continues to add to its already rich source of metrics with additional metrics about new technologies such as Pervasive Encryption, data compression, and other features. Properly analyzing these new data sources using antiquated reporting techniques and products requires custom coding and manual interpretation. Consequently, many sites today have significant gaps in visibility into the metrics required to support newer technologies.
Better intelligence means processing and assessing all of these new data types. Representing that information in an easy to understand manner that is flexible and interactive eliminates the need to invest resources to learn, develop, and maintain one’s own custom reports to understand and manage these new infrastructure components.
Moving Ahead with AIOps
The integration of artificial intelligence with IT operations analysis is now being referred to by some in the market as “AIOps”. 2018 is likely to see the emergence of AIOps on a much larger scale than in previous years because it provides a breakthrough in productivity and effectiveness at a time when human analysts are coming under increased loads due to past reductions in staff while the workload and infrastructure complexity is growing.
View our recorded webinar, 2018 RMF/SMF Analytics – Status & Predictions, where we discussed many of these topics in greater detail and demonstrated ways that you can implement your own modern strategies to current problems you may be facing.
2018 RMF/SMF Analytics - Status & Predictions
2018 is a watershed year for how RMF (or CMF) and SMF data is used by Performance and Capacity teams. This webinar covers the smarter analytics trends taking root in the mainstream.
6 Reports Every IBM TS7700 Performance Analyst Should Have
Rather than reviewing every indicator across your entire tape grid, there are 6 key reports that you should be reviewing regularly to keep your TS7700 in good working order.
IBM z15 Announcement Highlights and How to Take Advantage
The z15 (with a General Availability date of 9/23/2019) offers up to 190 CPU cores (vs. 170 on z14) and 40 TB of usable memory (vs. 32 on z14), in addition to processor cache and overall performance improvements.
The z/OS Performance and Capacity Skills Gap
IntelliMagic Vision collects performance and configuration information on the VMware, fabric and storage systems to provide a complete and end-to-end picture.