IT Pro is supported by its audience. When you purchase through links on our site, we may earn an affiliate commission. Learn more
In-depth

How to minimise the impact of cloud outages on your business

How should you prepare to weather an outage in your cloud?

dark cloud coverage

More organisations than ever are moving applications and data to the cloud. While this undoubtedly offers many benefits, outages are a fact of life, with 100% uptime far from guaranteed. Therefore, a key part of any business' strategy if using the cloud should be to reduce the impact of these disruptions when they occur.

In March, it was found that human error was responsible for an Amazon Web Services (AWS) outage that affected millions of customers earlier in the year. These major cloud provider outages should teach us that IT infrastructure, if not monitored correctly, can experience complete shutdowns and drastically reduced performance. It's true for both physical infrastructure and the cloud, according to Virtual Instruments EMEA marketing director Chris James.

"Remember: the term 'cloud' simply refers to an outsourced data centre. You can either manage your own data centre (on premises), have someone manage your data centre (private cloud), or use someone else's data centre (public cloud)," he says.

Robert Castley, senior performance engineer for EMEA at Catchpoint Systems, says that the incident highlights that a 100% uptime is unrealistic, even for impressive infrastructures like AWS. "The fact that these major websites and services were completely unavailable wasn't Amazon's fault. We shouldn't forget that the cloud is simply a patchwork of servers, switches and code, which means that it's vulnerable to potential outages and performance issues," he adds.

Lessons learnt from cloud failure

Perhaps the most important lesson to be learnt is that businesses should take precautions for when situations like this occur.

"In the case of the AWS outage, we have to be clear that it isn't Amazon's responsibility to create a redundancy plan for its customers, but it is responsible to mitigate outages of its systems and get them back online as fast as possible," says Castley.

The three key lessons that organisations can learn from this, according to Castley, are that it's important for you to monitor your own services, alongside third-party apps, on a regular basis, as this helps you catch performance issues in a timely manner. Second, it's crucial to learn about the technology used by third-party apps as they are vital for your customers' experience. "For example, who they rely on for hosting their technology and who their DNS provider is are critical things to be aware of," he says.

Third, users need to be provided accurate and timely information at all times. It's important to be transparent about the issue and provide your customers with regular updates on the social media platforms they use, says Castley.

Contingency plans

What contingency plans and procedures should organisations put in place to help get through, and minimise the potential disruption to business when the cloud falls? When organisations move critical services and data like email off premise, they must plan for the inevitability that the service will go down -- just as they would with business continuity solutions on their own infrastructure, according to Dan Sloshberg, cyber resilience expert at Mimecast.

"Rather than maintain LAN tethers, this should be done through a secondary cloud service that can work seamlessly with primary providers to ensure business continuity and maintain data access," he says.

As a few outages are down to human error, vendors should limit the ability for humans to cause them, according to Oliver Pinson-Roxburgh, EMEA director at Alert Logic. "In addition, due to the speed and agility of the cloud, organisations can easily cause their own outages. Also, we have seen examples where hackers have put organisations out of business by deleting whole workloads in the cloud that weren't backed up," he says.

He adds that contingency plans should be in place before moving workloads to the cloud.

"The more subscribers there are to a cloud service, the bigger the impact of an outage becomes to individual customers and the broader community. Those that put contingencies in place will ultimately have an advantage over those that simply hope for the best."

Mitigating outages

Sloshberg says it's important to put a risk mitigation strategy in place for key systems that have been moved to the cloud, and that the plan is tested regularly. "By staging a cloud outage, organisations can fully understand how the business will cope if this does occur. Taking a cyber resilience approach, IT teams should ensure critical infrastructure, like email, can continue to operate during a primary cloud system outage, key data is backed up and remains available to search and access and importantly, security layers remain active in order to protect the organisation."

He adds that early detection and visibility of any issues, together with the right technology and well-tested processes, can help mitigate the impact of an outage and get the business back up and running more quickly when primary systems come back online.

It's possible to design for complete robust business continuity, disaster avoidance and recovery solutions so effects can be mitigated. "It's important to have the right processes and procedures in place so people know what to do -- or ensure it's managed for businesses by the right managed service provider, who can take control [of] the situation," says Roy Wood, managing director of IT services at Advanced.

Dealing with future outages

Although outages are likely to be a problem for a while yet, as no one solution can promise 100% uptime, a growing number of organisations are using containers to deploy their production workloads. Can this emerging technology help reduce the risk of cloud outages?

"Containers abstract the virtualisation layer of traditional hypervisors by running lightweight operating systems that only include the libraries and binaries required for the application. This means that they are very easily portable and entire environments can be quickly and easily deployed," says James Hooper, senior manager for Virtual Data Centre Solutions at Interoute.

He adds that container management platforms enhance this further by adding orchestration capabilities, allowing customers to provision their container infrastructure on any cloud irrespective of the location.

"There are now a number of providers on the market offering container platforms and managed services for these, allowing customers to focus on the applications themselves whilst the provider manages the underlying infrastructure and container platforms," he says.

Featured Resources

Activation playbook: Deliver data that powers impactful, game-changing campaigns

Bringing together data and technology to drive better business outcomes

Free Download

In unpredictable times, a data strategy is key

Data processes are crucial to guide decisions and drive business growth

Free Download

Achieving resiliency with Everything-as-a-Service (XAAS)

Transforming the enterprise IT landscape

Free Download

What is contextual analytics?

Creating more customer value in HR software applications

Free Download

Recommended

What is Amazon S3?
Amazon S3

What is Amazon S3?

16 May 2022
EDB unveils world-first openly governed Kubernetes Postgres operator
Cloud

EDB unveils world-first openly governed Kubernetes Postgres operator

13 May 2022
How the cloud primed Markerstudy for an M&A spree
Cloud

How the cloud primed Markerstudy for an M&A spree

9 May 2022
Gaia-X: The last chance saloon for Europe’s visionary cloud project
Cloud

Gaia-X: The last chance saloon for Europe’s visionary cloud project

4 May 2022

Most Popular

Open source packages with millions of installs hacked to harvest AWS credentials
hacking

Open source packages with millions of installs hacked to harvest AWS credentials

24 May 2022
Europe's first autonomous petrol station opens in Lisbon
automation

Europe's first autonomous petrol station opens in Lisbon

23 May 2022
Nvidia pauses hiring to help cope with inflation
Careers & training

Nvidia pauses hiring to help cope with inflation

23 May 2022