We work with many large z/OS customers and have seen only one requiring more than a petabyte (PB) of primary disk storage in a single sysplex. Additional z/OS environments may exist, but we’ve not yet seen them (if you are that site, we’d love to hear from you!). The larger environments are 400-750 TB per sysplex and growing, so it’s likely those will reach a Petabyte requirement soon.
IBM has already stated that the 64K device limitation will not be lifted. Customers requiring more than 64K devices have gotten relief by migrating to larger devices (3390-54 and/or Extended Address Volumes) and by exploiting of Multiple SubSystems (MSS) for use by PAV Aliases and Metro Mirror (PPRC) Secondary and FlashCopy Target devices.
The purpose of this blog is to discuss the strategies of how to position existing and future technologies to allow for this required growth.
Objective: Configure 1 PB of disk storage which is accessible and usable from a single z/OS Image or Sysplex. These are non-replicated base volumes; any requirements for replication will necessitate additional storage.
Reality: You can do this today. Let’s for simplicity forget about binary and decimal capacity, and say that 1PB is equal to 1,000,000 GB and that 1GB is the same as 1 DS8000 CKD Extent (in fact a CKD extent is exactly 946,005,480 Bytes). So for the purpose of this exercise, 1 PB equals 1,000,000 CKD extents. Also , we’ll aim to use no more than 20,000 primary device addresses, to leave 45,535 devices for other purposes (like replication, or tape) if needed.
When planning to use Petabyte Capacity, we need to be looking at the storage hardware and its usable capacity, the environment’s access density, your 3390 Logical Volume size selection and your addressing.
Disk Subsystem (DSS) Hardware and Usable Capacity
We are going to compute how many storage systems you would need for one PB of usable, primary storage. A typical z/OS configuration may consist of 300GB small form factor (SFF) drives in a RAID 5 configuration (7+P). Let’s pretend that 300 GB drives truly provide 300 GB (in reality they’re only about 290 GB). A maximum-frame disk subsystem may have room to install 1536 or even 2304 SFF drives (depending on the vendor). But we can’t just multiply 300 by 2304 to get the effective capacity of a full box. First of all, for RAID 5 (7+P) you lose 1/8th of the capacity due to parity. Then you lose some more because of the need for spares, and then even more because of all kinds of overheads. Let’s therefore say a disk system with two thousand 300GB drives will effectively provide about 2000*300*7/8 = 525,000 GB minus overheads and spares. Let us round that down to 500 TB, or half a PB. So with storage systems full of 300GB drives, you might need two or three (depending on the vendor) to get one PB.
The largest DDM capacity available today in enterprise storage systems is 4TB LFF DDMs, often with a lower maximum number of drives than for SFF drives, maybe ‘only’ 768 or 1152. Also, the TB drives are configured in RAID6, with two parities per array. Similar calculations as above yield about 2 or 3 PB in only one fully loaded storage system, well over the 1 PB that we were looking for.
Most, if not all zOS installations can accommodate 1-4 Disk Subsystems in a sysplex configuration. We have customers with 10 or more DSSs installed in a single sysplex (with DSS configurations designed more for performance than capacity, so those are typically not fully loaded with drives).
Access Density is expressed as the expected IO rate across a usable disk capacity. Knowledge of Access Density is critical for capacity planning and estimating the impact of future growth. The software product IntelliMagic Vision calculates Access Density in IOs per Second per GB for each storage system in the configuration. In addition to Access Density; Channel speed and efficiency, Channel group size (typically 4 or 8 CHPIDs), and number of control units per channel group also play a role in planning our configuration.
3390 Volume size selection
3390 Volume size selection is critical to getting to a Petabyte Capacity. It’s not possible to install a petabyte of useable capacity using 3390 model 3s. With our assumption of 1 GB = 1 extent, this would require 333,333 mod 3 devices to meet this goal, far exceeding our self-imposed limit of 20K devices. For 3390-model 9s you’d need 111,111, still exceeding the 20K devices that we set as a maximum. Model 27s could meet the PB goal under the zOS 64K device address restriction but would still exceed our own limit that saves some addresses for a rainy day. Our best choices to meet the Petabyte Capacity requirement are either 3390 model 54s or EAV. Dominant use of these devices is required to meet our goal. There’s no harm in reserving a few smaller capacity devices to meet application requirements. Using a 3390 model 54 for a System Residence volume is likely a large waste of space.
There is no architected value for the number of cylinders for a 3390-model 54, model 27 or an EAV. There are just minimum and maximum values. I’ve always been in the school that a model 27 or 54 or EAV capacity should reflect a multiple of a 3390 model 9 (10017 cylinders), and have advocated the use of 30051 cylinders for a model 27 rather than the maximum 32760 and 60102 cylinders for a model 54 rather than the maximum of 65520. This means that 3 3390 model 9s equal a model 27 and 6 3390 model 9s equal a model 54. This allows easy calculation for the capacity of the environment and for paring capacity for purposes of replication. Given that first use of these custom 3390s requires device folding, the actual number of cylinders is not critical to the device’s definition and productive use and I have worked in environments where the maximum values were used without ill effects. It’s been proven true that once you start with a number of cylinders for one of these devices, it’s the size you’ll be using from now on (so be happy with your choice, whatever that might be).
The previous paragraph used the term ‘device folding’. I stopped to think that this audience may not be familiar with that term. Device Folding is intended to describe the activity of merging the data from multiple volumes to a single volume. In a z/OS sense, this is largely a logical copy of the data because this involves changing not only the data’s location, but also the catalog information concerning the data’s location. IntelliMagic has a product called IntelliMagic Balance to help with this process by identifying candidate volumes to be folded.
Critical to using the higher capacity devices is the use of Parallel Access Volumes. If you are presently using 3390-3s and 3390-9s in your environment without suffering from IOSQ time, it’s pretty much guaranteed that you’ll introduce IOSQ time when you fold your present environment to much larger devices. Exclusive use of Hyper PAV is advised for all but the rarest circumstances.
The use of FICON to access these devices imposes the FICON architecture on this exercise. We need to ensure we don’t exceed 256 devices per Control unit and our access density is typically the limiting factor for the number of control units per Channel Group. For the purposes of Illustration, we will allocate the LCUs with 64 base addresses in MSS0 and 32 aliases in MSS1.
- 1,000,000 extents /54 =18,519 devices. (round up to 18.560 devices)
- 18,560/64 = 290 LCUs across multiple DSSs.
- 290 LCUs / 16 LCUs per Channel Group = 19 8 CHPID Channel Groups (152 8GB CHPIDs)
Extended Address Volumes presently support volumes 18 times larger than a 3390-54. For the above calculation to work, we could estimate that we keep 19 channel groups and reduce the number of LCUs per channel group from 16 to 1. This reduces the number of devices to 1216 (64*19). Since these are 1TB volumes, we are easily in the Petabyte range. Within the design of EAVs, they can grow even larger, from 1,182,006 to 268,434,453 cylinders ((2**28)-1102). This reflects the maximum cylinder addressing possible from CCHHR after EAV exploits 12 bits of HH in addition to the 16 bits of CC it’s already using. This would be a 227 TB volume.
zOS is Petabyte ready and Exabyte capable.
Take note of changes in technologies that allow for increased access density like today’s use of Solid State Devices or improvements in FICON bandwidth will allow increasing the supporting capacity of these resources or alternatively shrinking the number of required resources to meet this goal.
In preparing for change I have always found that plans are useless, but planning is indispensable.
An Update on zEDC and the Nest Accelerator Unit (NXU)
Advancements similar to the NXU are likely to become more commonplace since raw processor speeds have plateaued. Specialized processing drives new metrics to manage an already complex system. What’s affected? How will you keep up?
Top ‘IntelliMagic zAcademy’ Webinars of 2023
View the top rated mainframe performance webinars of 2023 covering insights on the z16, overcoming skills gap challenges, understanding z/OS configuration, and more!
z/OS Connect EE: Strategic On-ramp to the Mainframe
In this reprint from Cheryl Watson’s Tuning Letter, Todd Havekost describes how to use z/OS Connect's SMF records to monitor and manage the performance experienced by exploiters of z/OS Connect’s services