It is no secret that IT service outages and disruptions can cost companies anywhere from thousands up to millions of dollars per incident – plus significant damage to company reputation and customer satisfaction. In the most high profile cases, such as recent IT outages at Delta and Southwest Airlines, the costs can soar to over $150 million per incident (Delta Cancels 280 Flights Due to IT Outage). Quite suddenly, IT infrastructure performance can become a CEO level issue (Unions Want Southwest CEO Removed After IT Outage).
While those kinds of major incidents make the headlines, there are thousands of lesser known, but still just as disruptive to business, service level disruptions and outages happening daily in just about every sizeable enterprise.
The costs of these often daily occurring incidents, like an unexpected slowdown in response time of a key business application during prime shift, can have a significant cumulative financial impact that may not be readily visible in the company’s accounting system.
Analyst Estimates of Service Incidents
ISO defines an ‘incident’ as: “an unplanned interruption to a service, a reduction in the quality of a service or an event that has not yet impacted the service to the customer.”
Determining the cost of service incidents is not a new endeavor. There have been numerous studies and surveys done in order to reach a discernible figure. Across the various studies, the sample size and company composition have varied, but the consensus remains that downtime is, as expected, expensive. According to these studies, this is the cost of service downtime:
- Gartner: $5,600 per minute or $300k per hour.
- Avaya: $140k per incident for the average company. $540k per incident for the financial sector.
- IDC: $1.25 billion to $2.5 billion per year for Fortune 1000 companies with $1.39 billion revenue.
- IHS: $700 billion per year for North American companies, according to IT decision-makers at 400 medium and large organizations in North America that use information and communication technology.
Costs Categories to Consider
Application performance incidents can sometimes be directly translated into ‘hard costs’:
- Employee headcount spent ‘firefighting’ performance issues – plus overtime costs or contractor costs. This can be broader than only the IT employees working the specific issue – such as call center personnel trying to calm upset users or customers
- Service Level Agreement (SLA) financial penalties
- Lost or deferred revenue if the applications affected are revenue generating
- Government fines if the applications affected have regulatory requirements
- Litigation or settlement costs
So it is obvious that disruptions cost organizations financially, but there are also other negative effects or ‘soft costs’ which can linger on beyond the immediate financial impact, such as:
- Customer or user satisfaction decreases
- Loss of reputation or brand image
- Loss of employee morale and increased turnover from employees who are constantly called in at off hours or the weekends to deal with incidents
- Negative image of your IT team and your management
- Lost market opportunities
The exact cost figures and financial categories may vary, but the conclusion must be the same: it is very costly to have IT performance issues and foolish not to focus on using the latest technology available to prevent them before they occur. Wherever your company falls within the spectrum of incident / downtime costs – you cannot afford it – not when the possibility of preventing most service disruptions is entirely feasible with modern software technologies.
Modern Approaches to Preventing IT Outages and Disruptions
Using modern artificial intelligence techniques, combined with automated analysis of end-to-end infrastructure performance for key business applications, offers the ability to catch many incidents before they impact business users. In our experience, over 85% of ‘unpredictable’ incidents in enterprises could have been easily predicted – even up to days in advance of the actual service disruption.
To predict incidents, however, requires continual automated analysis of critical ‘leading indicator’ metrics across the infrastructure combined with knowledge of what is ‘good or bad’ for that specific infrastructure component. One of our customers, Medavie Blue Cross, provides an example of how easy it can be to make hidden issues visible when using a modern approach to infrastructure performance that includes those capabilities:
“We installed IntelliMagic Vision and looked at the fabric dashboard. It immediately showed an issue that had been hidden until then….IntelliMagic is crucial to avoiding performance and configuration issues” – Marc LeBlanc, Medavie Blue Cross Storage Administrator
So there is an 85% chance your most recent infrastructure incident could have been predicted and its associated hard and soft costs could have been avoided. If you are interested to see for yourself, you can click here to see where your next ‘unpredictable’ performance issue will be by providing IntelliMagic a week of performance metadata from your current environment and getting back a customized analysis of risk areas.
Regardless of if your company’s cost per incident is $10,000 or $150 million, being able to significantly reduce the volume of them will generate meaningful cost savings very quickly. And perhaps more importantly it will keep your company out of the press headlines – your CEO will thank you for that.
The z/OS Performance and Capacity Skills Gap
IntelliMagic Vision collects performance and configuration information on the VMware, fabric and storage systems to provide a complete and end-to-end picture.
Understanding & Dealing with z14 Traffic Patterns
The z14 is designed for massive, parallel processing. So why do delays still occur? This webinar will explore common sources of application delays and discuss practical solutions to reduce these delays.
Best Practices for Managing your SAN Performance (Part 3: Planning)
Within infrastructure capacity management it is important that we consider growth to help us understand future costs for budgeting purposes.
How to use Processor Cache Optimization to Reduce z Systems Costs
Optimizing processor cache can significantly reduce CPU consumption, and thus z Systems software costs, for your workload on modern z Systems processors. This paper shows how you can identify areas for improvement and measure the results, using data from SMF 113 records.
This article's author
Share this blog
Subscribe to our Newsletter
Subscribe to our newsletter and receive monthly updates about the latest industry news and high quality content, like webinars, blogs, white papers, and more.