Brett Allison - 4 October 2019

Your NetApp CIFS latency is through the roof and your users are banging on the door. You can either panic or try to figure out what’s going on in the system. Don’t worry. There are ways to avoid the panic attack and work through the root cause analysis.

As a storage engineer or analyst, it is important to understand the key workloads and perform regular comprehensive health checks in your environment. This provides the advantage of knowing the heavy hitters in the environment and understanding how close your environment is to running out of resources.

If this sounds like too much work, that is because it is too much work to do it manually. Let’s take a look at an automated assessment of the different protocols on a NetApp cluster as shown in Figure 1.

1 NetApp FAS Protocol Latency Dashboard

Figure 1: NetApp FAS Protocol Latency Dashboard

 

NetApp cluster sepnas001 has significant NFS latency as well as CIFS latency as indicated by the red exclamation bubble and the yellow warning icon. Let’s drill down to look at the key performance metrics over time as shown in Figure 2.

NetApp FAS Protocol Minicharts

Figure 2: NetApp FAS Protocol Minicharts

 

The average CIFS latency peaks for several hours. By drilling down from the cluster CIFS latency to the CIFS latency by node we see that sepnas001-n2 is the only node affected by the CIFS latency increases as shown in Figure 3.

CIFS Latency by Node

Figure 3: CIFS Latency by Node

 

Since there is weak correlation between the I/O operations and the latency we want to inspect the CPU and disk resources for sepnas001-n2 on the NetApp System in Figure 4 to see if there are resource constraints.

NetApp FAS Minicharts

Figure 4: NetApp FAS Minicharts

 

As you can see in Figure 4, there is correlation between the average processor utilization and the increase in latency on sepnas001-n2. The constrained processor is leading to the increase in latency. When CPU utilization is greater than 70% on NetApp systems the latency tends to suffer noticeably.

When CPU utilization is greater than 70% on NetApp systems the latency tends to suffer noticeably.

Let’s drill down and see who is driving the load. In order to see who is driving the load we simply click on the throughput and drill down to the Flexvols as shown in Figure 5. The volume labeled ‘edwsal_matra’ appears to have significantly more throughput than any other volume.

Throughput by Flexvols

Figure 5: Throughput by Flexvols

 

By clicking on identify on the volume we can observe which vFiler the volume is associated with as shown in Figure 6. The vFiler is vs38_wingroupep1.

Flexvol identify

Figure 6: Flexvol identify

 

You can issue the following command from the advanced privilege level to identify the source IP of the host driving the load:

Cluster-1::*>Statistics top client show

Once you know the IP address of the host driving the load you can relate it to the server name (nslookup). From the server name you will need to work with the application owners to understand if the workload is normal or unexpected.

Summary of Findings

In this short blog we looked at a NetApp c-mode system that was identified as having high CIFS and NFS latency. We made the following observations:

  • CIFS and NFS latency are high on sepnas001-n2
  • CPU utilization is high on sepnas001-n2 during the period where high latency is observed
  • The workload is primarily associated with vs38_wingroupep1 and volume ‘edwsal_matra’

Don’t Panic – Get IntelliMagic

IntelliMagic Vision provides intelligent dashboards that highlight potential issues or constraints in your storage infrastructure. If you want to quickly identify the source of issues on your NetApp systems or perform proactive performance health checks, IntelliMagic Vision facilitates highly intuitive and interactive analysis of your c-mode systems

This article's author

Brett Allison
Director of Technical Services
Read Brett's bio

Share this blog

How to Manage Performance in InfiniBox Systems

Subscribe to our Blogs

Subscribe to our newsletter and receive monthly updates about the latest industry news and high quality content, like webinars, blogs, white papers, and more.

Related Resources

Webinar

NetApp C-Mode Architecture and Performance Analysis 101

This webinar looks at performance and capacity for your NetApp arrays running ONTAP Cluster-mode.

Watch Webinar
Webinar

HDS G1500/F1500 Series - Architecture and Performance Analysis

This webinar looks at the key physical and logical components that make up the architecture of the HDS VSP G1500/F1500 series and provides insights into the key related performance metrics.

Watch Webinar
Whitepaper

Storage Performance Analysis for HDS VSP G1000/G1500/F1500 Series

This whitepaper provides a brief overview of the HDS VSP G1500/F1500 series hardware and discusses some best practices for configuring your HDS VSP for optimal performance within an actual SAN infrastructure.

Download

Go to Resources