Okay, you have a fair idea of the I/O workload to be expected, but how do you determine the impact to other applications running on your storage? In the last blog, I discussed key questions that need to be asked in order to understand the I/O workload profile for a new application. In the following section, we will investigate how to assess how this new workload will impact your storage environment.
For the purpose of this discussion, we keep it limited to the storage controllers, even though you should also consider SAN Fabric, Server, and Network infrastructure.
I suggest the following process for evaluating the impact of new applications on existing storage controllers.
Step 1: Identify the I/O workload profile. See the first installment of this blog for details on how you can do this. The result is an understanding of the I/O workload profile and expectations of the users. The I/O workload profile consists of:
- Number of reads/sec
- Number of writes/sec
- Read transfer size
- Write transfer size
- Expected read response time (OLTP), throughput (Batch)
- Expected write response time (OLTP) throughput (Batch)
Step 2: Evaluate your storage controller’s current performance health. A storage controller has a number of logical and physical components. From a physical perspective, there are Front-end Ports, Front-end Adapters, Controller Bus, Switch board and/or Central Processors, Back-end Device Adapters, and Back-end physical disks. Each physical component can only handle a certain number of operations and megabytes. The maximum number of operations will vary depending on the type of operation – typically writes are more ‘expensive’ than reads.
How can you assess how heavy the current load is on each of the hardware components within your storage controllers? Most controllers do not expose utilization for any of these components. Perhaps service times and operations are provided for some of these components, and then utilizations can be estimated. In the case where utilizations are not exposed or cannot be calculated you may be able to estimate based upon the specifications provided by the vendor or by using published test results.
IntelliMagic Vision is excellent at evaluating the performance health of your storage controllers.
Step 3: Evaluate how much additional workload can be added to the existing storage components before throughput limits are reached or response time sharply increases. This is the most important step, as this is about quantifying the actual resource usage that can be added without impacting the environment in a negative way.
There are several approaches to this:
1) Skip this step and just put the new workload on the existing environment and hope for the best. If things go south, make sure you have someone else to blame or update your resume.
2) Call your hardware vendor. They will be more than happy to sell you more storage hardware without necessarily checking if your current arrays would suffice.
3) Study your existing environments and build some estimates based on what you know about the historical performance within your environment. For an experienced storage architect / analyst this approach reduces the risks considerably compared to option (1) and (2), although it is probably wiser to use this in the Budget and planning stage only, and of course as a way to check what the hardware vendor said when you called him in option (2).
4) Utilize advanced queuing theory, specialized knowledge of building models of storage controllers, gather lots of measurements and specifications, and build your own modeling tool. If you have both the time and the knowledge to do this then maybe you should quit your day job and start your own company.
5) Use a proven storage modeling product and/or services.
Our add-on product IntelliMagic Direction, bundled with professional services, is specifically developed to model storage systems and identify utilizations at the various physical components, and provides I/O latency and maximum throughput estimates for all kinds of what-if scenarios. New workloads can be added to models of existing storage arrays to estimate the impact in terms of I/O latency and component utilization. You can check out our web site to see if we support your storage platforms.
Step 4: If the answer to step 3 was that the current storage systems will not be able to handle the additional workload, go to step 5. If the answer is that it all fits, identify the other applications that will be potentially impacted if your analysis was not fully accurate. Weigh the cost of impacting other applications with the confidence level in your entire analysis. Does it make sense to move forward with adding this new application to an existing storage system?
If the answer is still yes, then allocate storage ports and volumes to the new application. If the answer is no – the risks are too high – go to step 5.
Step 5: If the previous steps showed that your current storage systems are expected to not be able to cope with the additional workload, or the risks to your loved existing applications are too high, first examine using IntelliMagic Vision or a similar storage performance management tool as to whether you can re-balance the existing workloads across all logical and physical resources. If there is room for this type of improvement, perform this optimization and return to step 2 to re-do your analysis.
If optimizing the current storage usage isn’t possible, you need to find out exactly which storage hardware options would be a perfect fit for your new workload. You can keep your vendor(s) honest by using our professional services with IntelliMagic Direction to do a deep study of how the new storage that your vendor(s) suggests would handle your new applications’ workload, or even provide the vendors with some ideas on hardware options that came out of the IntelliMagic study. It has happened often that the configuration originally recommended by the vendor was rather oversized and the customer ended up with a significantly cheaper solution after our study.
If you follow these steps you will reduce the risk of impacting your storage environment. Do you follow a risk-reducing process for evaluating new applications?
An Update on zEDC and the Nest Accelerator Unit (NXU)
Advancements similar to the NXU are likely to become more commonplace since raw processor speeds have plateaued. Specialized processing drives new metrics to manage an already complex system. What’s affected? How will you keep up?
How to Detect and Resolve “State in Doubt” Errors
IntelliMagic Vision helps reduce the visibility gaps and improves availability by providing deep insights, practical drill downs and specialized domain knowledge for your SAN environment.
Top ‘IntelliMagic zAcademy’ Webinars of 2023
View the top rated mainframe performance webinars of 2023 covering insights on the z16, overcoming skills gap challenges, understanding z/OS configuration, and more!