Brett Allison - 22 October 2015

One of our customers recently came across a problem in their environment that I think warrants some attention. The VMWare administrator had gone to the storage team and asked if they saw any issues on the Fabric or IBM SVC storage environment because the infamous “state in doubt” message was popping up in the /var/log/vmkernel log file messages were similar to what is shown below:

<YYYY-MM-DD>T<TIME> esx12 vmkernel: 116:03:44:19.039 cpu4:4100)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device “sym.029010111831353837”state in doubt; requested fast path state update…

The error indicated that there was a time-out by the HBA because the command took longer than 5 seconds to complete.

Typically this implies a problem anywhere along the SAN path from your physical ESX host/HBA to the back-end storage including:

  • Fabric F ports to which the host is connected
  • Fabric ISLs (E ports) if in path
  • Storage ports

VMWare has an excellent article in their knowledge base for more information on this topic: Information about the error: (1022026).

Being the savvy storage engineers that they are, our customer did a quick check of the IntelliMagic Fabric Error dashboards and SVC Port Dashboards.

Invalid Transmission Word Errors

While there were no issues observed on the SAN Fabric, the IntelliMagic Vision SVC Port Dashboard identified issues with Invalid Transmission Words as demonstrated in Figure 2:

Figure 2: Bad SFP – Invalid Transmission Word Errors

Invalid Transmission WordA transmission word consists of four 10-bit codes that must be sent in a precise format.  If they are not in the correct format, the switch detects that the Transmission Word is invalid. This can happen for a number of reasons such as a faulty cable, a bad Small Form-factor Pluggable transceiver (SFP) or as a result of a cable being temporarily unplugged. If this happens once in a while, it might be that a cable got unplugged. However, if this happens continually, there are definitely issues that should be addressed.

In this case, the only visible symptoms were “state in doubt” errors in the VMWare error log and the red flag in the IntelliMagic Vision dashboard. Because of a still relatively low error rate, the issue did not cause significant performance impacts. But if left unresolved, the situation could degrade further and could have resulted in significant performance and connectivity impacts for all hosts connected to the offending SVC port.

Drilling into the “State in Doubt” Issue

Drilling down from the IntelliMagic Vision dashboard the customer found that the issue was with port ‘node1-2’, as shown in Figure 3. It was decided to replace the SFP on this port. The graph shows that this was the right decision: the Invalid Transmission Word errors ended abruptly on 6/16/2015 at 10:00 AM when the SFP was replaced. After that, both the “state in doubt” errors, as well as the Invalid Transmission Words on the SVC port ended.

Figure 3: Invalid Transmission Words on SVC node1-2

Invalid Transmission Words on SVC node1-2SAN storage infrastructure is complicated because there are so many components. IntelliMagic Vision can help reduce the visibility gaps and improve the availability of your SAN fabric and connected storage by providing deep insights, practical drill downs and specialized domain knowledge for your SAN environment.

The short video below demonstrates how we allow you to visualize all of the components from the VMware host through the fabric to the storage volume.


To learn more about IntelliMagic’s support for VMware and our Topology Viewer, visit intellimagic.com/vmware.

Related

Video

Improve Collaboration and Reporting with a Single View for Multi-Vendor Storage Performance

Learn how utilizing a single pane of glass for multi-vendor storage reporting and analysis improves not only the effectiveness of your reporting, but also the collaboration amongst team members and departments.

Watch video
Video

A Single View for Managing Multi-Vendor SAN Infrastructure

Managing a SAN environment with a mix of storage vendors is always challenging because you have to rely on multiple tools to keep storage devices and systems functioning like they should.

Watch video
Whitepaper

Evaluator Group Technical Insight on Availability Assurance for Complex, Multivendor Storage Environments

Evaluator Group's Sr. Analyst John Webster discusses IntelliMagic Vision's Software as a Service and the importance of Availability Assurance in storage environments.

Download

Go to Resources

This article's author

Brett Allison
VP of Operations
Read Brett's bio

Share this blog

Subscribe to our Newsletter

Subscribe to our newsletter and receive monthly updates about the latest industry news and high quality content, like webinars, blogs, white papers, and more.