When large scale disk storage systems were originally introduced, a volume, as seen by the operating system, corresponded to a physical disk drive. As storage technology has moved forward and capacities have increased, z/OS disk storage systems have incorporated reliability features such as RAID (Redundant Array of Independent Disks) and performance features such as striping. This caused the one-to-one relationship between volumes and disk drives to break down. Today, the operating system sees volumes as logical entities composed of storage collected across many physical devices. This has had many benefits to system performance and availability, but has also introduced complexity. If you find elevated response time on a logical volume, how do you know which physical drives may be causing it?
Logical Volume Mapping
Storage systems map many logical volumes to other logical constructs that may be referred to as extent pools or disk groups or policies. The pools are composed of extents which are formatted chunks of space spread across many disks or RAID groups as shown in the picture below.
In these types of implementations, the data for each logical volume is typically spread over many or all physical disks in the pool. Each pool is based on multiple physical disks and may consist of multiple RAID arrays. This means that a single logical disk should not cause a single physical disk to become very busy. Because physical disks can be very large, very many logical volumes are mapped to one group. The resulting I/O density for the underlying physical disks may cause problems, especially with bad cache locality (data bases) or with high write data rates (batch workloads).
To avoid such problems, it is very important to balance the workload equally over all the disks in a disk system. Without this balance, you will not be able to exploit all the potential that the hardware offers, and performance bottlenecks may occur. IntelliMagic Vision for z/OS provides insight into the relationships between the physical and logical resources as well as the necessary performance measurement so that users can understand imbalances between the volumes, pools and supporting drives or RAID groups. The chart below is an example of an IntelliMagic Vision balance chart. It shows that one RAID array in this storage pool is imbalanced in terms of read response time. This could indicate a physical problem with a disk drive in the array or more likely that this RAID array is overloaded.
More recently, storage vendors have incorporated auto-tiering into their design. This means that extent pools have become even more complex as they now may consist of multiple tiers of disk or flash storage. Each extent will typically be tied to a particular tier but a logical volume may include extents that reside on various storage tiers.
IntelliMagic Vision can give you a view of how logical volumes are using the various storage tiers in an auto-tiering environment. This can help you assure that your flash capacity is being used as effectively as possible. The chart below illustrates how IntelliMagic Vision can tell you how much volume activity is distributed to the flash devices on a storage system.
Tying it Together
IntelliMagic Vision for z/OS Disk provides the Intelligence needed to understand how logical workloads behave on your physical hardware. Without this capability, the relationship will remain a mystery and that could cause pain in your daily operations.
IntelliMagic Vision provides insight into the relationships between the physical and logical resources as well as the necessary performance measurement so that users can understand imbalances between the Volumes, Pools and supporting drives or RAID groups. View the video below to see how this insight can prevent scheduling too much work at one time.
The Perils of Complacency with Enterprise Storage Systems
Over the last few years, Enterprise Storage Systems have advanced so much that it seems that they have nearly unlimited performance and resiliency. However, becoming complacent with your storage performance is fraught with peril.
How Far Behind is My Asynchronous Replication?
Most peer-to-peer replication methodologies prioritize application performance over replication currency, but if replication falls too far behind you need to find out about it quickly and know what should be done to fix it.
Estimating Storage System Capabilities Should not be a Risky Business!
If you want a useful headroom metric you need to define it properly.