When virtual tape systems run properly, it’s great. But when there are problems, or you need to examine detailed tape information, the virtualization makes it hard to see what is really going on inside the black box.
Luckily, with z/OS we have SMF, and the virtual tape hardware vendors can define a custom record to provide measurements on the internals that can help see what is happening inside. Oracle STK VSM, for instance, generates detailed SMF data in a user record that allows us to examine tape processing in fine detail using the intelligent post-processing from the enhanced Oracle Tape support in IntelliMagic Vision.
Some of the questions that you may want answered are:
- Why are the tape mounts taking so long?
- How many virtual tape mounts need to be staged from real tapes?
- Are my virtual tapes being replicated in a timely fashion?
Let’s look at an example of investigating virtual tape mounts that take a long time.
We see the activity per VSM system, but this doesn’t tell us how long virtual tape mounts are taking. Fortunately, there is a wealth of data available for us to dig deeper, with just a few clicks.
The maximum mount time graph shows us a significant peak of around 800 seconds to mount a tape for VSM system PZRW95E. Why is it taking so long? And which Job or task is affected?
Drilling down through the data to the affected VTVs (Virtual Tape Volume) is easy and reveals the following:
VTV 0EPZWE is being requested by DFHSM. This is a mount for an existing VTV, and it is taking 830 seconds – over 13 minutes! Why is it taking so long to mount a VTV?
By clicking on identify in IntelliMagic Vision, detailed information concerning all activity for this volume is displayed. This includes not only the detailed VSM SMF data but also the native z/OS SMF information obtained from the INPUT/OUTPUT and Mount SMF records (SMF 14, 15 and 21.)
Note: No replication information is available since no new data was written to this tape volume.
Looking closer at the original VSM mount, we see that it was received at 4:47 PM and that a recall was necessary. The recall was initiated at 4:57 PM and completed at 5:01 PM. The tape volume that DHSM is mounting needs to be staged from a real tape, which means that a mount on a real tape drive (RTD) needs to occur. The recall information above indicates that RTD number 8 was used to mount physical volume 192347 that contained the VTV.
So why is it taking so long before the recall mount on RTD number 8 starts?
Taking a closer look at the activity for the real stacked tape MVC (Multiple Volume Cartridge), 192347 reveals the contention.
The original DFHSM mount at 4:47 (previously highlighted in blue) is waiting because other VTVs are being used on the same MVC. Recall of VTV 0EPAXM starting at 4:45, then recall of 0ERTTW at 4:48, and finally recall of 0EPBAY at 4:52 are all being processed before VTV 0EPZWE can be mounted. This happens to be DFHSM requesting volumes that have been stacked onto the same MVC.
In this case, DFHSM has no awareness of where the logical volumes (VTVs) have been stacked so DFHSM cannot optimize mount requests.
Without IntelliMagic Vision’s enrichment and presentation of the Oracle STK VSM SMF data, this kind of analysis is impossible.
Best Practices for Monitoring Oracle/STK VSM Usable MVC Space
Automate how you monitor the health of the MVC Storage Classes by following these 5 best practices.
Oracle VSM Performance Analysis
This paper addresses correlating the virtual tape hardware metrics with the SMF tape workload data.
How to Measure the Impact of a Zero RPO Strategy
This blog focuses on the impact of jobs using the Oracle/STK VSM Enhanced Synchronous Replication capability while delivering a Recovery Point Objective (RPO) of 0.