Is the Amazon Web Services' SLA really the worst one going?

AWS advertisement close up in underground transit platform in NYC Subway Station
(Image credit: Shutterstock)

This month’s question for the panel is on Amazon’s terms and conditions

“I am interested in gaining a better understanding of experiences/considerations with AWS's current SLA terms and conditions. I know Dave Cartwright has previously spoken about cloud vendor SLAs, but I don't recall seeing (nor could find) specific discussion on AWS SLAs. Additionally, I know a December 2012 blog by Gartner Research VP Lydia Leong asserted that AWS has the “worst SLA of any major cloud IaaS provider, but little specific reference was provided why that might be the case.

Can the panel help provide any insight and/or point me to previously reported information I may have missed?”

Our panel's response:

Matthew Graham-Hyde, CIO Kantar

From a pure commercial POV the AWS SLAs are written and executed in a way that are favourable to AWS -- which is not surprising.

i do not have any actual hands on experience with trying to claim an SLA from AWS - there's a reason. Our approach with AWS has been "when it fails" not "if it fails" - this is a different mentality. The "if it fails" approach would focus heavily on SLA and outages as opposed to the "when if fails" which has accounted for a potential outage and safeguarded the business in advance .... (which is what the SLA is trying to backstop against).

AWS always professes on the fact that we should design for failure of any single component. Using that principle and taking advantage of their multi availability zone, we have been able to design systems that are virtually downtime free. As businesses move to the cloud, the application design to build keeping fault tolerance in mind is key to better availability of systems.

As for technical support on any issues, having Enterprise Support contract has allowed us to get technical support and assistance from AWS that is at par with any third party provider of infrastructure services.

Lauren Lachal, analyst Ovum

AWS has expanded its SLA portfolio from a zero to six: two generic (for customers and API users) and four product specific (for S3, EC2, CloudFront, and Route53). Some of its recent offerings, VPC in particular, are also expected to get their own SLAs once in production. Others, such as RDS, should already have had their own. However, even services with SLAs do not compare particularly well with the competition at all levels:

Service level: AWS promises 99.95 per cent availability for each EC2 region and 99.9 per cent for S3 and CloudFront, while the competition offers higher levels (up to 100 per cent). However, AWS’s website, unlike the SLA agreement, states 99.99 per cent availability for S3 and 99.999999999 per cent durability – a concept that the SLA ignores.

Compensation level: Service credit is capped at 10 per cent of the EC2 monthly bill and 25 per cent of the S3 and CloudFront bills (if uptime falls below 99 per cent – it caps at 10 per cent otherwise), compared with up to 100 per cent credit with the competition.

  • Compensation time frame: EC2 service levels are calculated based on the previous 365 days, while some of the competition calculates this on a monthly basis (though AWS uses this model for S3 and CloudFront). If a customer has been using EC2 for less than 365 days, any days prior to the use of the service are deemed to have had 100 per cent availability. This means new customers receive less compensation.
  • Compensation scope: AWS’s SLAs, like those of most competitors, only guarantee uptime, not performance. The upfront fees paid to reserve EC2 instances are not taken into account in the calculation of the service credits.
  • What constitutes an outage? Scheduled maintenance and outages that last less than five minutes are not considered downtime. To be deemed as suffering an outage, EC2 customers need to have all of their running instances denied external connectivity in more than one AZ within the same region with no ability to launch replacement instances.
  • Compensation process: Redress is reactive. AWS customers must not only ask for it, but also prove the outage. AWS requires a lot of information to do this. For example, for EC2 customers must provide the date and time of each incident, including IDs of the instances affected, as well as server request logs that document the errors – with any confidential or sensitive information in these logs removed or replaced with asterisks. In addition, the way this information needs to be delivered is inconsistent. For EC2 it has to be delivered within 30 business days of the last reported incident in the SLA claim, but For S3 it must be delivered within 10 days after the end of the billing cycle in which the errors occurred.
  • Like most of its competitors’ SLAs, AWS’s agreements only cover AWS’s data centres. They are not end-to-end SLAs that cover both network and data centres.

Dave Cartwright

First of all, I'm flattered to be mentioned - nice that someone actually reads my ramblings.

Second, Amazon's SLA is as public as they get; it is no worse than many I've come across … but it's still pretty rubbish.

So they say that they use "commercially reasonable efforts" to ensure that the service is available 99.95 per cent of the time. That's not a guarantee - they're basically saying that they'll do their best but if it's too expensive they won't do it.

By my maths that means they're allowed just over 21 minutes of downtime per month - which for many applications I've run in the past isn't enough, but on the other hand it's fine for others (in 2000 when I was going for second-round funding for a startup, the potential investor grilled me on the 41 minutes' downtime we'd had IN A YEAR).

Also Amazon says that service outage means "zero read write IO, with pending IO in the queue" - which I take to mean that the service is considered "up" if it's running with the speed of a one-legged sloth on Prozac that's necked a bottle of Smirnoff Blue.

Third, as I've said a few times: SLAs are usually entirely useless. All you get in the event of a problem is service credits, and this seldom qualifies as adequate recompense for the business consequence of an outage.

Even if a service provider gives you a guarantee, you can't rely on it (I once had France Telecom offer a guaranteed four-hour fix, but frankly if someone had put a JCB through enough cables there was no chance they could ever hit it). You therefore need to understand the context and consequence of your service provider(s) failing and mitigate it. An SLA is only ever a target, and you must always approach it with a bucket of salt.

Ross Kelly
News and Analysis Editor

Ross Kelly is ITPro's News & Analysis Editor, responsible for leading the brand's news output and in-depth reporting on the latest stories from across the business technology landscape. Ross was previously a Staff Writer, during which time he developed a keen interest in cyber security, business leadership, and emerging technologies.

He graduated from Edinburgh Napier University in 2016 with a BA (Hons) in Journalism, and joined ITPro in 2022 after four years working in technology conference research.

For news pitches, you can contact Ross at ross.kelly@futurenet.com, or on Twitter and LinkedIn.