Brett Allison -

This blog was originally published on September 18, 2017.

Do you have any SAN or VMware connectivity risks? Chances are you do. Unfortunately, there is no easy way to see them without special tooling. That’s because seeing the real end-to-end risks from the VMware guest through the SAN fabric to the Storage LUN is a difficult thing to do in practice as it requires many relationships from a variety of sources.   

A complete end to end picture requires: 

  • VMware guests to the ESX Hosts 
  • ESX hosts initiators to targets 
  • ESX hosts and datastores, VM guests and datastores, and ESX datastores to LUNs 
  • Zone sets  
  • Target ports to host adapters and LUNs and storage ports.

 

For seasoned SAN professionals, none of this information is very difficult to comprehend. The trick is tying it all together in a cohesive way so you can visualize these relationships and quickly identify any asymmetry or potential issues.  

In Figure 1 below you can see that the ESX host represented by the top rectangle has a single path to the storage. Should something happen to the host port or the switch port it is connected to the host will go offline and be unable to perform. 

single-path

Figure 1: Single Path ESX Host

 

Another potential issue that can happen is when you have odd link connections to the fabric. Why are odd link connections important? Let’s look at an actual example: 

odd_link

Figure 2: Odd Link Connections

 

In this example you can see that the host has three connections to the fabric through three different switches. In an ideal configuration you will have four connections to the fabric from an ESX host.   

Typically, you will have two connections through each fabric giving you fabric level redundancy. In Figure 2, we have two odd numbered switches, ending in 21 and 23 that belong to the same fabric. We also have one even numbered switch ending in 22. The risk in this situation is that if any point along the path of the switch ending in 22 fails, you will be limited to a single fabric.   

There are a few of reasons why an initiator may not be connected to the fabric: 

  1. Bad host port 
  2. Bad zoning  
  3. Bad switch port 

In this case the ESX host has four initiators but the one ending in BB does not have any connections.  The host error logs will need to be reviewed to see if there are any host port hardware issues. 

initiators

 

The second step is to look at the zoning for the port ending in BB that doesn’t have connectivity: 

zone

 

The zone includes the following ports:  

  1. for the initiator and  
  2. for the storage target port 

zone_members

 

We know that the zone is good since the port 50:00:09:7B:50:00:A8:5B is a real port on VMAX04 and the storage port shows that it is operational: 

storage_port

 

The last area to investigate is the switch ports. All three of the host ports that are connected are connected to switch port 429 on their respective switches (*21,*23, *22).   

The switch that is missing from the topology is the one ending in 24. Looking at the health of switch ending in *24 and its respective ports we see that port 429 is not in a good state as seen below: 

switch_port 

As you can see, the port 429 on the switch ending in 24 is not operational and has a status of Error. This is why there is not any connectivity through this switch. The port will need to be repaired in order to re-establish connectivity. 

If your environment has more than a few switch ports, it is very likely that your environment has some ticking time bombs.  If you would like to have a free audit of your fabric environment please let us know.

Free SAN Connectivity Audit

For a free connectivity audit of your SAN environment please send me an email at brett.allison@intellimagic.com

You can view the capabilities described in this blog in the video below:

This article's author

Brett Allison
VP of Operations
Read Brett's bio

Share this blog

Related Resources

Blog

How to Detect and Resolve “State in Doubt” Errors

IntelliMagic Vision helps reduce the visibility gaps and improves availability by providing deep insights, practical drill downs and specialized domain knowledge for your SAN environment.

Read more
Blog

Platform-Specific Views: Vendor Neutral SAN Monitoring Part 2

Each distributed system platform has unique nuances. It's important for a solution to be capable of getting the detailed performance data capable of supporting vendor-specific architectures.

Read more
Webinar

I/O Problems in VMware Environments – How to See and Fix the Most Common SAN Issues

This webinar covers the basics of VMware I/O Performance, Capacity, and Configuration Analytics for SAN.

Watch Webinar

Go to Resources

Request a Free Trial or Schedule a Demo Today

Discuss your technical or sales-related questions with our availability experts today