This video illustrates how Db2 SMF data can be leveraged to identify drivers of an observed reduction in elapsed time. This example isolates the change by authorization ID, buffer pool, and page set.
More Db2 Accounting Videos
- Exploring Analysis by Connection Type
- Exploring Analysis by Correlation ID
- Exploring Elapsed Time Profiles
- Exploring Analysis by Authorization ID
- Exploring Prefetch Activity and Suspension Events
- Exploring Analysis by Plan or Package Name
- Exploring Database Sync I/O Activity
- Case Study: Isolating Change Drivers
- Exploring Other Metrics in Accounting Data by Plan
- Accounting Data: Customized Dashboard Recap
So, we’ve explored the accounting data by connection type, correlation ID, authorization ID, and planner package name. Now, if you were with us in our zAcademy Db2 Statistics session last month, we identified a reduction in a key elapsed time metric, but we deferred investigating further into the driver of that change until we could leverage the details available from the accounting data. So now, armed with what we’ve seen today, we’re in a position to pursue that analysis.
All right. So, if you were with us in that session, you may remember that we captured a lot of views in this dashboard, but in particular, this last one, and it showed a reduction in this key elapsed time metric. Now, this view focused on four components that are often key contributors, out of the 30, to overall elapsed time. So, we looked at class GCP time within Db2, zIIP time within Db2, and then the other read I/O, which, as we talked about, typically correlates to prefetch activity.
But here we saw that this reduction in database sync I/O wait time occurring at some point in the middle of the week and it had sufficient impact to cause the average value here for this metric for the entire DDF workload to decrease measurably. So if we’re on a Db2 infrastructure team, we identify this improvement, but assuming we didn’t receive advanced communication about the change, let’s see how we might go about leveraging the accounting data to isolate the specific workload that drove that reduction.
So, let’s go ahead and focus now specifically on that database sync I/O, and now let’s look at it. Again, this is DDF work so typically the starting place is Auth ID. And again, let me go ahead and just look at the top CPU consumers.
All right. So, there are lots of Auth IDs. We’ve kind of looked at the top 10 consumers. Now we notice here that there’s one Auth ID that has really large sync I/O wait times, and it’s compressing the scale for everybody else. So let me just go ahead and remove that particular Auth ID. And now that we do that, we have identified the Auth ID that had a very large sync I/O wait time and now had a dramatic decrease in that wait time. And we can see now when it occurred. It looks like the benefit began in the 8:00 PM hour on 6/30 here. Now I’ve got access to this data at 15-minute intervals, so let me go down to that level. And now that I do that, I see that it looks like it was particularly an interval beginning at 8:15 that experienced the benefit pretty much throughout its entire time period.
So, we’ve isolated the change to an Auth ID here, beginning with CCS, and we have narrowed it down to a very specific point in time here. So, let’s go ahead and capture that in this particular dashboard. Now, as you may have already guessed in real life, this huge improvement resulted from a SQL change. In this case, it involved de-normalized tables to avoid running summarization calculations on child rows.
All right. So, this change created a very dramatic improvement in sync I/O wait times. So let’s see if it also impacted those other three key elapsed time metrics that we were highlighting. So let’s go ahead and bring those back in here, the GCP time, the zIIP time, and the other read I/O time.
So, when I look at those, we see that that change also had created dramatic improvements in the GCP time and in the zIIP time, in both cases greater than 80% reductions in both of those types of CPU. All right. And if you were with us in the Db2 Statistics session that this big improvement in sync I/O wait time surfaced as we were analyzing buffer pools 20 and 21, a data and index pair buffer pools.
So let’s go ahead and look for that Auth ID, now let’s look at the sync Read I/Os by buffer pool. And again, let’s look at those over time. In this case, I’m going to need to actually bump up the scale. All right. And when we do that, we can see massive improvements in the volume of sync Read I/Os for this. Buffer pool 20 was the data buffer pool and buffer pool 21 is the index buffer pool. So, for 21, it goes from like 2000 I/Os per commit to something around 200 I/Os per commit. So very dramatic savings there.
Now to supplement this Auth ID finding, another thing we might choose to look at is to leverage the IFCID 199 data that captures buffer pool statistics at the data set level and to identify the impact of this change in sync Read I/Os by page set. So again, we said buffer pool 21 had a huge improvement. So let’s go ahead. And so here are the page sets that are in that buffer pool. Again, let’s go ahead and look at this over time and look at it across the entire week.
And when we do that, we can see now that this particular page set experienced a huge drop in sync I/Os. So, in our case study here that started with identifying a significant performance change, which emerged during a buffer pool tuning analysis exercise, we identified that the symptom was a decrease in database sync I/O wait time in the DDF workload and we leveraged the accounting, along with the IFCID 199 data, to isolate the driver of that change. We identified correlations both with a specific auth ID and with a page set, and through the accounting data, we were able the pinpoint the time of the change basically 8:15 PM on 6/30.
You May Also Be Interested In:
Profiling zHyperLink Performance and Usage
In this blog, we demonstrate how to profile zHyperLink performance and usage by reviewing one mainframe site’s recent production implementation of zHyperLink for reads, for Db2.
How A Db2 Newbie Quickly Spotlights the Root Cause of a Db2 Slowdown
This blog explores how a non-Db2 expert quickly identified latch contention arising from a Data Sharing Index Split as the root cause of a Db2 delay/slowdown.
Integrating Dataset Performance (SMF 42.6) and Db2 Buffer Pools & Databases (SMF 102/IFCID 199) Data
Dataset performance data from SMF 42.6 records provides disk cache and response time data at the Db2 buffer pool and Db2 database levels when integrated with Db2 Statistics IFCID 199 data.