Hi my name is Brett Allison, and I am the Director of Technical Services at IntelliMagic. I’m going to demonstrate how you can leverage IntelliMagic Vision to quickly determine the root cause of performance issues within your EMC VNX storage environment. This is important if you want to maximize the throughput of your VNX systems. In this video, we’re going to take a look at one of the many dashboards in IntelliMagic Vision. The dashboards are a great starting point for analyzing your SAN infrastructure.
And this is our version of the dashboard for a VNX Family Storage System. And the purpose of the dashboard is to provide a simple, single pane of glass to see the performance risks of your entire EMC VNX environment. The enterprise performance dashboard condenses time, performance, and machine specific capabilities into a single view with each row in the chart representing a different storage system, and each column representing a different key risk indicator. The color theme is that of a stoplight, and the size of the bubble indicates the potential risk. The DSS dashboard provides a single pane of view for all your storage risks, and the colors represent the intensity of the risk with red being the highest risk level, and green indicating a low risk level. Something unique to IntelliMagic and important to this view is the rating. The rating is a way of scoring the disk level of different metrics, and if the risk level of a metric exceeds the machine specific warning and exception thresholds for the analysis period, then the rating will be incremented. When the metric value exceeds the warning threshold for 10% of the period, then the color of the metric will turn yellow. Additionally, if the metric value exceeds the exception threshold for 10% of the time period, then the metric bubble will turn red. The rating at the top right hand in the chart indicates the highest rated metric within the chart. There’s a number of storage systems in this environment as identified by their names on the Y-axis, starting with NJ01 and ending in STF08. Along the x-axis the key risk indicators, Read and Write Performance, Response Time, Forced Flushes, Average Pool Disk Utilization, Average SP Utilization, and Average Port Utilization, provide an understanding of the health of each EMC VNX storage system. Since these are DSS level metrics, sometimes the average metrics can be misleading. For example, I see the Response Time for STF06 is red and when I hover over the bubble the rating is .55. This indicates that response time is much worse than expected. In order to identify the root cause we’re actually going to drill down by clicking on the bubble to see the charts over time, and we’ll look at that in the next slide.
As we look at the VNX multi-chart, and we scan from left to right, I’m going to make a couple mental checkboxes: First, we observe that the Front-end Adapter Utilization is okay. Secondly, we look at the Response Time Utilization, and it peaks in the last 2/3’s of the response time chart. As you drop your gaze vertically you’ll also notice that the Port Utilization chart (showing utilization for individual ports) shows a line that corresponds very closely with the response time peak. It’s also rated red, indicating there’s a problem. As you continue moving your gaze across the top row you’ll notice throughput peaks are not exactly during the response time peak but slightly after that. This means that the throughput alone was not the root cause but was related. Moving to the right you see the forced flushes peak for a very brief period, maybe a 1-interval, and the response time was high during that period, but since it’s such a small peak it is likely not a contributing factor to the root cause. Moving to the right we see that one of the storage pools has an increased drive busy during the period, and this is a reflection of the throughput increase. It may also mean that the workload is isolated to a couple of busy pools. In summary, it is safe to conclude that the port utilization is the single biggest contributor to the elongated response time. Let’s take a closer look.
By clicking on the mini-chart we present the full-sized complete chart with the utilization of the individual ports and their associated port names and ratings. The port with the highest utilization during the peak response time period was port SP_A:5. We can find out more about this port by clicking on the Identify drill down.
We see that this port has a data rate of only 1 Gbps. If they want to have better performance, they’re going to need to increase the bandwidth. So that would be the first logical step to resolve this performance issue.
In this video we demonstrated how simple it is to troubleshoot VNX performance problems with IntelliMagic Vision.
Improve Collaboration and Reporting with a Single View for Multi-Vendor Storage Performance
Learn how utilizing a single pane of glass for multi-vendor storage reporting and analysis improves not only the effectiveness of your reporting, but also the collaboration amongst team members and departments.
A Single View for Managing Multi-Vendor SAN Infrastructure
Managing a SAN environment with a mix of storage vendors is always challenging because you have to rely on multiple tools to keep storage devices and systems functioning like they should.
How to Manage Performance in Dell EMC VPLEX Environments
Managing performance in a virtualized storage environment can be tricky. See how you can manage performance not only on the front-end Vplex, but its back-end arrays.