This is the third article in a four-part series focusing largely on a topic that has the potential to generate significant cost savings, but which has not received the attention it deserves, namely processor cache optimization. (Read part one here and part two here.) Without an understanding of the vital role processor cache plays in CPU consumption and clear visibility into the key cache metrics in your environment, significant opportunities to reduce CPU consumption and MLC expense may not be realized.
This article highlights how optimizing physical hardware configurations can substantially improve processor cache efficiency and thus reduce MLC costs. Three approaches to maximizing work executing on Vertical High (VH) logical CPs through increasing the number of physical CPs will be considered. Restating one of the key findings of the first article, work executing on VHs optimizes processor cache effectiveness, because its 1-1 relationship with a physical CP means it will consistently access the same processor cache.
Sub-capacity Processor Models
One way to increase the number of physical CPs is to utilize sub-capacity processor models. These models provide additional physical CPs for the same MSU capacity, which translates into more VHs without incurring additional hardware expense.
If single engine speed requirements or other considerations in your environment do not require full capacity models, sub-capacity models could be selected when upgrading to new generation technologies. This approach adds physical CPs, along with the associated Level 1 and 2 processor cache, without incurring additional hardware expense. This results in more VHs and thus greater processor cache efficiency.
Figure 1. z13 models with similar capacity rating
Figure 1 shows various z13 models that have similar MSU capacity ratings as a zEC12-711. Opting for sub-capacity models in this example would result in seven to fifteen more physical CPs than the full capacity z13-710, significantly increasing the available processor cache and workload executing on VHs. Selecting a sub-capacity model would likely result in additional realized capacity due to the CPU savings resulting from increased processor cache efficiency, even though the rated capacity would be comparable to the z13-710.
On/Off Capacity on Demand
A second approach to increasing the number of physical CPs with minimal incremental cost is to leverage On/Off Capacity on Demand (CoD). On/Off CoD is an IBM offering that allows sites to enable and disable physical CPs to meet temporary peak business needs.
IBM MLC software expense is determined by the peak four hour rolling average for the month, so if monthly peaks occur at predictable time intervals, On/Off CoD can be activated during those peak intervals. Deploying this additional capacity creates more physical CPs and thus more VHs. This enables more of the workload to execute on VHs, reducing RNI and thus CPU consumption. And since this is occurring precisely at the time of the monthly peak, it translates directly into reduced MLC expense.
Deploy Additional Hardware Capacity
A third way to increase the number of physical CPs is to install or deploy additional capacity, leveraging a one-time hardware expense to achieve greater, recurring MLC software savings.
Traditional mainframe capacity planning has been predicated on the assumption that running mainframes at high utilizations is the most cost-effective way to operate. The results of this processor cache analysis and the case study we will consider shortly challenge that approach. They indicate that in many cases significant reductions in MSU consumption and thus MLC software costs may be achieved by operating at lower utilizations due to the increased impact that cache effectiveness has on z13 processors.
Understandably, developing the business case to acquire additional hardware capacity may be challenging. But many sites would not require that business case because they already have a business practice of pre-purchasing additional capacity they do not immediately deploy. (Reasons for this practice include negotiating a volume discount or long-term lease, avoiding the procurement effort involved with frequent acquisitions, or acquiring capacity required for seasonal peaks.) But even these sites tend to deploy that previously acquired capacity in a “just in time” manner, rather than reaping the recurring benefits of MLC software savings by deploying the surplus capacity as soon as it is acquired.
In either case, whether acquiring additional hardware capacity or deploying previously acquired capacity, the framework for re-evaluating this approach is that in most mainframe environments software represents a much larger expense than hardware, with MLC software typically constituting the single largest expense line item. If a one-time hardware acquisition expense achieves software savings which are realized on a recurring basis year-after-year, there may be a strong business case for acquiring or deploying hardware capacity that significantly exceeds the business workload requirements.
One reservation frequently expressed about this approach is that Independent Software Vendor (ISV) licenses are a barrier to deploying surplus capacity because many ISV contracts are capacity-based and not usage-based. Capacity-based ISV contracts can severely limit your operational flexibility until they are renegotiated to become usage-based. A proactive initiative to renegotiate any capacity-based ISV contracts can place you in the enviable position of having the flexibility to configure your environment in the most cost-effective manner going forward. (An initiative to renegotiate all ISV contracts to usage-based was successfully achieved at my previous employer.)
Use Case: Deploy Hardware Capacity
The following use case shows the potential magnitude of the impact of deploying hardware on processor cache efficiency. The case involves a high RNI workload and hardware capacity that had been previously acquired but not deployed.
Figure 2 is a weekly tracking chart comparing a Production business online workload (red line) with CPU consumption (blue line). For the last several months in a zEC12 environment, these values tracked very closely. After all four processors were upgraded to z13-711 models, CPU consumption far exceeded the business workload.
Figure 2. CPU impact of z13 implementations
The cycle speed on z13 processors is 9% slower than the zEC12, yet rated capacity is 10% higher. The z13 relies upon improved cache effectiveness to achieve this increased capacity. But this was a “high RNI” workload, one that placed a high demand on processor cache. (Relative Nest Intensity is a metric reflecting how far into the shared cache and memory hierarchy the processor needs to go when staging data or instructions not found in Level 1 cache.) The result was a capacity shortfall of 4000 MIPS versus the zEC12 configuration.
When additional capacity (z13-716 models) was deployed that enabled most of the workload to execute on VHs, MIPS consumption was reduced dramatically (see Figure 3). This resulted in 9000 MIPS savings versus the prior z13-711 configuration for equivalent business workloads.
Figure 3. CPU impact of z13-716 upgrades
Ultimately all the previously acquired capacity was deployed through upgrades to z13-726 models, resulting in a 13,000 MIPS reduction from the z13-711 configuration (see Figure 4). The primary factor driving this final round of savings was not processor cache efficiency but a reduction in the MSU/CP ratio, which we will now consider.
Figure 4. CPU impact of z13-726 upgrades
MSU/CP Ratio
Another important factor impacting measured CPU consumption that adds to the business case for acquiring or deploying additional hardware relates to the multiprocessing (MP) effect. The MP effect applies to any hardware environment and reflects the fact that adding cores (CPs) increases the overhead required to manage the interactions between physical processors and shared compute resources.
To account for the typical MP effect when running at relatively high utilizations, the MSU/CP ratios in IBM’s processor ratings are not linear but decrease significantly as more CPs are added as Figure 5 shows.
Figure 5. MSU/CP ratios for various z13 models
But if the business workload remains the same when CPs are added, overall processor utilization will decrease, and additional overhead from the MP effect is likely to be negligible. In that case, a lower MSU/CP rating translates directly into lower MSU consumption for the same workload. Figure 6 compares the MSU/CP ratios for the four processor models in the previous use case.
Figure 6. MSU/CP ratios from use case
 The Orange value shows the 10%+ increase associated with the original z13-711 implementation, one that the High RNI workload fell far short of achieving. The Green value shows the decrease of 18% in the MSU/CP rating for the z13-726 from the z13-711. Or expressed another way, a workload that would generate 1000 MSUs on a 711 processor would generate only 819 MSUs when run on a 726, solely due to the reduced MSU/CP rating. This is in addition to the CPU savings from improved processor cache efficiency that would also accompany this change.
The outcome is there are two significant benefits of deploying additional hardware capacity while running at lower utilizations that combine to create even larger savings. First, less CPU is consumed due to operating efficiencies from more effective use of processor cache (along with other operating system efficiencies that also result from running at lower utilizations). And second, the CPU that is consumed translates into fewer MSUs, due to the decrease in the processor MSU/CP ratings.
Summary
Processor cache performance plays a more prominent role than ever before in the capacity delivered by z13 processors. And this is a trend that continues with the recently announced z14 processors, due to the engineering challenges associated with increasing cycle speeds.
Since unproductive cycles spent waiting to stage data into L1 cache typically represent one third to one half of overall CPU, it is incumbent on the mainframe performance analyst to have clear visibility into the key metrics and to leverage those metrics to optimize processor cache. This series of articles has presented methods to reduce RNI and CPU consumption, along with case studies confirming the effectiveness of these methods in real-life environments.
Another trend that is very likely to continue is that software expenses will consume a growing percentage of the overall mainframe budget, while hardware costs will represent an ever-smaller percentage. Considering these factors, it is indeed time for a thorough re-examination of the traditional assumption of mainframe capacity planning that running mainframes at high utilizations is the most cost-effective way to operate. A proactive initiative to renegotiate ISV contracts to be usage-based is also advised to position yourself with the flexibility to select hardware configurations that achieve the lowest overall total cost of ownership.
The final article in this series will examine the changes in processor cache design for the recently-announced z14 processor model and the impact of those changes on CPU consumption and MLC costs.
Read part 4, Impact of z14 on Processor Cache and MLC Expenses here.
Related
Mainframe Cost Savings Part 2: 4HRA, zIIP Overflow, XCF, and Db2 Memory
This blog covers several CPU reduction areas, including, moving work outside the monthly peak R4HA interval, reducing zIIP overflow, reducing XCF volumes, and leveraging Db2 memory to reduce I/Os.
Mainframe Cost Savings: Infrastructure Opportunities Part 1: Processor Cache
CPU optimization opportunities applicable across the infrastructure can often by implemented without the involvement of application teams and can benefit a significant portion (or all) of the work across the system.
Don’t Keep Your CPU Waiting: Speed Reading for Machines | IntelliMagic zAcademy
This webinar discusses the many tiers of storage in IT systems and offers ideas about how to optimize access to those areas.
How to use Processor Cache Optimization to Reduce z Systems Costs
Optimizing processor cache can significantly reduce CPU consumption and z Systems software costs for your workload on modern z Systems processors. This paper shows how you can identify areas for improvement and measure the results, using data from SMF 113 records.
This article's author
Share this blog
Subscribe to our Newsletter
Subscribe to our newsletter and receive monthly updates about the latest industry news and high quality content, like webinars, blogs, white papers, and more.