Your team has worked hard for the last year rebalancing workloads and deploying new hardware. You have made the environment highly efficient, I/O latency is low, and users are happy. Then the unthinkable happens, your boss just told you that a new application will be rolled out that needs lots of storage. Do you hit the panic button, call your vendor, get in your basement bunker or simply execute your new application sizing process? How do you predict the future without any historical reference? How do you ensure that restful nights and happy days will continue?
When evaluating the impact of new or unknown applications to the IT infrastructure it is important to have a process to facilitate information gathering and foster communication between the stakeholders. While a new application may not have a documented historical performance profile, the brutes typically have tell-tale signs. By asking the right questions you can avoid most of the risk associated with new applications.
Here are the top five questions to application owners when evaluating the impact of a new application to your storage environment.
Question #1: Is the application in question primarily batch (i.e., Data Warehouse) or OLTP (i.e., Online Banking, Customer service)?
While most applications have periods in which the application behaves as batch, this does not mean that the batch components are the primary focus of the application.
Why is this important? Batch applications tend to be throughput intensive due to their execution of large sequential reads and writes. These applications can significantly impact a storage controller’s front-end ports, adapters, and buses. The owners usually could care less about how much they hose everyone else in the environment as long as their job finishes in a reasonable amount of time.
OLTP applications tend to be those in which users are entering data into an interface and are expecting immediate responses. These might be for customer service, account support, sales, or other customer facing applications where response times are critical. I/O latency is important to the overall transaction time and should be minimal in order to support these types of applications.
Question #2: What are the application owner’s expectations of the I/O performance of the application?
Depending on the environment, this can be a bit of a Pandora’s box. If they don’t have any expectations for I/O latency then you may have just provided someone who doesn’t deeply understand I/O performance with an opportunity to provide you with unrealistic requirements. You should be prepared to discuss the type of storage resources that you have available and the expected I/O latency for OLTP, or throughput for batch, such that they will have reasonable expectations. Depending on how sophisticated your environment is, you may have multiple storage tiers with different performance expectations (Performance Service Level Objectives). If you don’t have these understood and defined, then this is something to work on. If you can’t articulate the performance that can be expected from your storage resources to your consumers, then you won’t be able to manage and set expectations.
Question #3: For OLTP applications, how many concurrent users will there be?
If they say five users, it is highly unlikely that they will be able to do much damage to your storage performance (of course there are exceptions to this – imaging, video, or scientific applications). This is where it helps to have reference points from other applications, both Batch and OLTP, in your environment. How much I/O and what types of I/O do these applications of a similar type generate in your existing environment?
Question #4: What is the workload mix of this application?
Okay, this is where it gets a little tricky. You are trying to establish a workload profile. If the application has not undergone any load testing, the owners will still know the use cases that they developed the application for and should be able to answer the following questions.
1) How many use cases are there?
2) Please describe in detail each use case?
3) How do the use cases map to I/O? For example:
- Use case 1: User checks their account balance. From unit testing you understand that this generates 42 reads of 1,024 bytes.
- Use case 2: User transfers data from savings to checking: Executes Use case 1 and then 10 writes of 1,024 bytes.
4) What is the use case mix? For example:
- 10% Use case 1
- 30% Use case 2
- 20% Use case 3
- 15% Use case 4
- 25% Use case 5
5) Using the answers to the previous questions about the workload mix, uses case to I/O ratio and the number of concurrent users you can derive the I/O workload (reads, writes, approximate read/write size). Measurements, even if taken from unit testing, are even better.
1) How much data do they need to move through the system?
2) Do they need to run their batch application during the peak online workload or current peak batch periods, or are they flexible?
3) Do they care for how long the job runs?
Whether the application was for batch or OLTP the workload I/O profile derived should consist of:
- Number of reads/sec
- Number of writes/sec
- Read transfer size
- Write transfer size
- Expected read response time (OLTP) or throughput (Batch)
- Expected write response time (OLTP) or throughput (Batch)
Question #5: How much space will they need?
This is important for several reasons. First, you obviously need to know how much space to provision. Second, based on the number and types of I/Os and the expected I/O latency you can determine what the most appropriate storage technology is for the application in question. Do you have enough storage of the correct type to satisfy their needs? Is the application so I/O intensive (high access density) that you have to short stroke spinning drives or use SSDs or Flash?
In the next blog I will discuss approaches to evaluating the impact on your storage environment of the I/O workload profile identified in this first blog. IntelliMagic has designed specialized products for evaluating and modeling I/O performance to assist in this type of analysis. Nevertheless, even with sophisticated tools, expertise is required to produce useful and realistic conclusions.
What will you do when a new application arrives?
Platform-Specific Views: Vendor Neutral SAN Monitoring Part 2
Each distributed system platform has unique nuances. It's important for a solution to be capable of getting the detailed performance data capable of supporting vendor-specific architectures.
Finding Hidden Time Bombs in Your VMware Connectivity
Seeing real end-to-end risks from the VMware guest through the SAN fabric to the Storage LUN is difficult, leading to many SAN Connectivity issues.
How to Detect and Resolve “State in Doubt” Errors
IntelliMagic Vision helps reduce the visibility gaps and improves availability by providing deep insights, practical drill downs and specialized domain knowledge for your SAN environment.