This is the third in a series of four blogs on the status of RMF as a storage performance monitoring tool. This one is specifically about EMC VMAX. The previous postings are “What RMF Should Be Telling You About Your Storage – But Isn’t” and “What IBM DS8000 Should Be Reporting in RMF/SMF – But Isn’t.”
RMF has been developed over the years by IBM based on its storage announcements – although even for IBM DS8000 not nearly all functions are covered, see this blog post. Other vendors will have to work with what IBM provides in RMF, or, like EMC does for some functionality, create their own SMF records.
EMC has supported IBM’s RMF 74.5 cache counters since they were introduced, and they’ve started using the ESS 74.8 records in the past several years to report on FICON host ports and Fibre replication ports. However, with respect to back-end reporting, it hasn’t been that simple. Since the EMC Symmetrix RAID architecture is fundamentally different from IBM’s, the EMC RAID group statistics cannot be reported on through RMF.
For EMC’s asynchronous replication method SRDF/A, SMF records were defined that, among other things, track cycle time and size. This is very valuable information for monitoring SRDF/A session load and health. Since Enginuity version 5876, SRDF/A Write Pacing statistics are written to SMF records as well, allowing users to track potential application impact. The 5876 release also provided very detailed SMF records for the TimeFinder/Clone Mainframe Snap Facility.
Still, there are areas where information remains lacking, in particular on back-end drive performance and utilization. Before thin provisioning was introduced, each z/OS volume would be defined from a set of Hyper Volumes on a limited number of physical disks. EMC provided great flexibility with this mapping: you could pick any set of Hyper Volumes that you like. While conceptually nice, this made it very hard to correlate workload and performance data for logical z/OS volumes to the workload on the physical disk drives. And, since the data on a z/OS volume was spread over a relative small number of back-end drives, performance issues were quite common. Many customers needed to ask EMC to conduct a study if they suspected such back-end issues – and they still do.
With the new thin provisioning and FAST auto-tiering options, the relationship between logical and physical disks has been defined through even more intermediate steps. While EMC’s FAST implementation using the policy mechanism is very powerful, it may be hard to manage for z/OS users, since no instrumentation is provided on the mainframe. On the positive side, since data tends to be spread over more disk drives because of the use of virtual pools rather than individual RAID groups, back-end performance issues are less likely than before. Still, more information on back-end activity is needed both to diagnose emerging problems and to make sure no hidden bottlenecks occur.
Information that should make it into RMF or SMF to uncover the hidden internals of the VMAX:
- Configuration data about SRDF replication. Right now, users need to issue SRDF commands to determine replication status. Yet proper and complete replication is essential for any DR usage, so the replication status should be recorded every RMF interval.
- Data that describes the logical to physical mapping, and physical disk drive utilizations. There is external configuration data available through proprietary EMC tools that can sometimes be used in combination with RMF to compute physical drive activity. This is no substitute for native reporting in RMF or SMF.
- Snapshot-related backend activity. Snapshots provide immediate logical copies which can generate significant back-end activity that is currently not recorded. Snapshots are a frequent player in hard-to-identify performance issues.
- FAST-VP policy definitions, tier usage and background activity. FAST-VP will supposedly always give you great performance, but it cannot do magic: you still need enough spindles and/or Flash drives to handle your workload. For automatic tiering to work well, history needs to repeat itself, as Lee LaFrese said in his recent blog post, “What Enterprise Storage system vendors won’t tell you about SSDs”. From z/OS, you want visibility into the migration activity, along with the policies for each Pool and the actual tiers that each volume is using.
It will probably be easier for EMC to create more custom SMF records, like they did for SRDF/A, than it would be to try to get their data into RMF. Such SMF records would be fully under control by EMC and can be designed to match the VMAX architecture, making it much easier to keep them up-to-date.
EMC does seem to respond to customer pressure to create reporting in SMF for important areas. An example of this is the creation of SRDF/A records and the recent write pacing monitor enhancement.
When considering your next EMC VMAX purchase, also consider discussing the ability to manage it with the tools that you use on the mainframe for this purpose: RMF and SMF. If your company’s order is big enough, EMC might consider adding even more mainframe-specific instrumentation.
Performance Management for z/OS Systems
z/OS systems infrastructure performance is critical to ensuring availability for end-users, but too often performance analysts are using monitoring tools or methods that are reactive rather than proactive.
Estimating Storage System Capabilities Should not be a Risky Business!
If you want a useful headroom metric you need to define it properly.
How to Avoid Application Infrastructure Performance Problems
"What are the top 5 million things you need to do today to avoid application infrastructure performance problems?"